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GENERATION AHO SELECTTOM OF NOVT-L BINDING PROTEINS 
BACKGROUND OF THE INVENTION 

- it? Id' of Tn vontion 

This invention relates to dcve Lcpaent ' of novel 
binding proteins by an iterative process of 
mutagenesis, expression, chromatographic selection, and 
anpl if ication. 

Information Disclosur e Statement 



.The amino acid sequence of a protein determines 
its three-dimensional (3D) structure, which in turn 
15 dotornir.es protein functioning (SPST*3,' A:.TI73;. A 
widely accepted system of classifying protein structure 
r.ay be found in Sc.VjIz *nd Scr.ir-er (SCHu79, Cii5) . 
Their c lass i t icat ion system is adopted herein. 



Shortle (fMOR35) , Sauer and colleagues (?AK r JSS, 
REID33), and Carutr.ers and collcaques {EISE33) have 
shown that sor.e residues on the polypeptide chain ara 
xore ir.portar.t Khnr. others in deterr.m ir.q the 20 
structure of a protein. The ID structure is 
essentially unaffected by the identity of the ar.ir.c 
acids at so-e lo-i; at other loci only one or a re- 
types of a-inc acid is allowed. In most cases, loci 
-.here wide variety is allowed have r.he amino acid sice 
group directed toward the solvent. Loci where United 
variety is allowed frequently have the siJe group 
directed toward other parts of the protein. Thus 
substitutions of ar.ino acids th.it are exposed to 
solvent are less likely to affect the 30 structure than 
are substitutions at internal loci. (See also SCIIU79, 
pl59-171 and C?SAZt , p23y-:i5. 3 i : - 3 1 5 ) . 
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The secondary structure (helices, sheets, turns, 
loops) of a protein is determined mostly by , local 
sequence. Certain aaino acids tend to te correlated 
5 with certain secondary structures and the csr^only used 
Chcu-Fasaan (CK0U7-;, CKCU78a, CHCU78b) ruLes depend on 
these correlations. The correlations bet-sen anino- 
acid type and secondary structure are noc, however, 
absolute, and every asino acid type has been observed 

10 in helices and in both parallel and antiparallel 
sheets. Kabsch and Sander (KABS84) report cn 

pentapeptides of identical sequence found in different 
prcteinc; in scue cases the conformations of the 
pentapeptides are very different. Argos (ARCC37) 

15 surveyed pentapeptides of similar sequence in different 
proteins and found that the structures of the sequence- 
similar subsequences vers frequently different. 

The residues that join helices to helir.es, helices 
20 to sheets, and sheets to sheets are called turns and 
loops and have recently been classified by Richavdscn 
(P.ICH31), Thcrr.ton (7HCR38) , Sutcliffe et a_U <SUTC37a) 
and others. Insertions and deletions are more readily 
tolerated in lccps than elsewhere. Thornton et a_U 
25 * (TKGR33 J have surprized .-any ccserva t ions indicating 
that related proteins usually differ nost at the lccps 
which join the r.cre rogul^r elements of secondary 
structure. 

30 When the amino acid sequence of cne protein has 

been changed to.be core like the sequence of a second 
protein, the properties of the novel protein usually 
approach the properties of the second protein. Wells 
et alj. (WELLS 7a) reported that changing three residues 

if ;„ c.sMHsin f ran "acilius arv \ o 1 icue f ac iens to be the 



0* c l 

3 

saC G as the corresponding residues in subtilisin fro.-a 
3 1 ichen: forr.is produced a protease that had nearly 
the sar.e activity as the subtilisin trom the latter 
organise. There were 82 diffcrenjes remaining in the 
sequences. The three residues changed were chesen 
because they were the only differences within 7 
Angstrc-s (Aj of the active site. 

Many proteins bind non-cova lent ly but very tightly 
and specifically to somp. other characteristic 
molecules. Schui* and Schiruer sunnarize many 
observations on the binding of proteins to other 
proteins (SCHU79, pS3-105) . For example, haer.cq lebin 
alpha chains bind very tightly to haemoglobin beta 
chains (delta C less than -11. 0 Kcal/nolo) ; antibodies 
bind tightly to antigens (K d s range frota 10* c to 10"- 
w t x d ; S the dissociation constant equal to 
[ A i f 3 J / ( A : 3 ] J ; basic bovine pancreatic trypsin 
inhibitor {d?TI) binds tiqhtly to trypsin ( K :1 « 6.0 >: 
l0 -14 M ( 7SCI:37), delta G = -15.0 Kcal/oolc}; ar.: 
avidin bir.ds to biotin (K d » l". 3 x 10" 15 M (CRII5< ( 
p362 ) ) . 

In each case the binuing results f r c - 
cc.-pic.T.entar i ty c: the surfaces that cone into contact: 
tur-.ps fit into holes, unlike charges cone together, 
dipoles align, 3nd' hydrophobic atoms contact ether 
hydrophobic atons. Although bulk water is excluded, 
individual water molecules are frequently found filling 
space in i nt:rno lecu la r interfaces; these warers 
usually fern hydrogen bonds to one or more ator.s of the 
protein or to other bound water. " Thus proteins found 
in nature have not attained, nor do they require 
perfect ocr.pl er.enta r i ty to bind tightly and 
specifically to their substrates. Only in rare cases 
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is there essentially perfect complementarity; then the 
binding is extre-.ely tight (as tor cxar.ple, avidLr. 
binding to bio tin) . 

5 The relative i-portar.ce of electrostatic vs. 

hydrophobic interactions is not . cully understood 
(SCliUTV. p!05J . Attraction between oppositely charge J 
groups apparently contributes little to the f rte-energy 
of binding between proteins and other molecules. LiV.e- 

10 charged groups can, however, increase specificity: 
repulsion of like-charged groups in the bindir.g 
interface or even unpaired charges in the interface can 
greatly reduce or eliminate binding in' instances -h^re 
shape and hydrophobic interactions would other- ice 

15 ir.duce it. 

rt has been observed, hcve»/ C r, that proteins car. 
tir.d to other noicculcr such that 1 ike -charged arz^zz 
are juxtaposed; in sucn instances repulsion is redu::-:c 

20 or eliminated by inclusion cf oppositely charged icr.s 
in the binding interface. An example of this 
phenomenon is the inclusion cf two positively charged 
calcium ions between each pair of subunits cf turr.:p 
- crinkle virus (HOGL33). The subunits each contain :va 

2d negatively charged 0 (s ing ic-l otter dair.o acid cudes 
are given in Table 1) and Z residues in close 
proximity. 

The factors affecting p rote in binding are Knc-T.. 

30 (CIIOT75. CHOT76, SCHV79, p'^8-lC7, .ind CREI3-;. Ch8 ) , but 
designing new co-p 1 e~er. ta ry surfaces has prcvez 
difficult. Although some rules have been developed for 
substituting side groups (GLTCSTb) . the side groups cf 
proteins arc floppy and it is difficult to predict what 

35 conformation a new side group will take. Further, the 
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forces that bind proteins to other molecules arc all 
relatively weak and it is difficult to predict the 
effects of these forces. 

Recently, Quiocho and collaborators (QUI087) 
elucidated the structures of several periplastic 
binding proteins from Cran-neq.it ive bacteria. They 
found tnat the proteins, despite having low sequence 
homology and differences in structural detail, have 
certain important similarities. Each of the proteins 
they investigated is composed of two dor.ains that are 
joined by three strands of protein. The binding site 
is located between the two dona ins and is isolated frc.-n 
bulk solvent. The structure of the binding site is 
dense and highly ordered, and binding constants are 
very high. The researchers suggest that binding of 
ligands causes a conformational change that alters the 
relative positions cr the two dona ins. 

The researchers found that each of the periplas-ic 
binding proteins has numerous residues (seven or mere), 
arrayed about the binding site. Surprisingly, icmc 
ligands are not bound by ionic side groups or opposite 
charge, but by cain-chain components. Electrical 
'charge seens to be neutralized by dicoie interactions. 
Further, hydrophobic contacts play an important role in 
binding . 



Based on their investigations of these^ binding 
30 proteins, Quiocho et aj^. suggest it is unlikely that, 
using current protein engineering ncthods, proteins can 
be constructed with binding properties superior to 
those of proteins that occur naturally. 
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Wilkinson ct al . (WILK3 4) have fc-nd, however, 
that enzyme-substrate affinity -ay be increased by 
protein engineering. They reported that a mutant of 
tyrosyl tRNA synthetase of Bacit lus stearcthermenh i I us 
that has proline at residue 51 exhibits a 100-fold 
increase in affinity for ATE* . 

Substitution of one amino acid for another at a 
surface locus may profoundly alter b Anting properties 
of the protein other than substrate binding, without 
affecting the tertiary structure of the protein. For 
example, in sickle-cell haemoglobin the change of the 
surface residue E6 to V in the beta chains causes 
deoxyhaemoglobin-S to form fibers through self binding 
( DICKS J , pl25-145). Love and others have shewn that 
the tertiary and quaternary structure of the 
haemoglobin are not changed (PADL35, WISH73, WI3H76) . 

Tan and Kaiser (TA:.'K77) and Tschesche ct a_K 
(TSCHB7) showed that changing a single a=:no acic in 
BPTI greatly reduces its binding to trypsin, but that 
some of the new molecules retain the parental 
characteristics of binding to ar.d inhibiting 
chymotrypsin, while others exhibit r.c- bindinq to 
elastase. Caruthers and others h?.ve snown 

that changes of single amino acids on the surface of 
the lambda Cro repressor greatly reduce its affinity 
for the natural operator Or3. but greatly increase tne 
binding of the mutant protein to a mutant operator. 
Thus changing the surface of a binding protein may 
alter its specificity without abolishing binding 
activi ty . 

The recently developed techniques of "reverse 
genetics' 1 have been used to produce single specific 
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mutations at precise base pair loci {CLIP86. OLrP97. 
ar.d AUSU87). Mutations are generally detected by 
sequencing and in' sonc cases by loss of wild-type 
function. These procedures allov researchers to 
analyze the function of each residue in a protein 
(MCLL83) or of each base pair a regulatory DNA 

sequence (CMEri«3). Tn these analyses, the norn has 
been w strive for the classical goal of obtaining 
mutants carrying a single alteration (AV3VS7) . 

Reverse genetics is frc~:ently applied to coding 
regions to determine which residues arc -est important 
to the protein structure s.-.ri fur.ctior.. In such 
studies, isolation of a single ■■j-.anfdt each residue 
of the protein gives an initial estimate of which 
residues play crucial roles. 
-\ 

Prior to the method of the present invention, tvo 
general approaches have beer, developed to create novel 
mutant proteins through reverse genetic-. Both r.cthccs 
start with a clone of the g-r.e oc interest. In cr.e 
approach, dubbed "protein surqery" (reviewed by Dill. 
(DILL37)), a specific substitution is introduced at a 
single protein residue by a synthetic -etrr:d using rhe 
corresponding natural or synthetic clcr.-d gene. Cra:V. 
'et.aJU (CRAI85), Roa et (RAOiiB^i. ar.i Bash o: a_i^ 

(BASK37) have used this approach to determine the 
effects on structure and function of specific 
substitutions in trypsin. 

The other approach has been to generate a variety 
of mutants at many loci within the cloned gene, the 
"gone-directed random r.u tacc nes i s" zochod. The 
specific location and nature of the change aro 
determined by DNA sequencing. It nay be possible to 
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screen for mutations if loss of a wild-type function 
confers a cellular phenotype. Using 
immunoprecipitat ion, on^ can th,en differentiate among 
mutant proteins that: a) fold out fail to function, b) 
5 fail to fold but persist, and c) arc degraded, perhaps 
due to failure to fold. This approach is exemplified 
by the work of Pokula jit aj^ (PAKU86) cn the etfect of 
point mutations on the structure and function of the 
Cro protein from bacteriophage lambda. This approach 

10 is limited by the number of colonies that can be 
examined. An additional important limitation is chat 
many desirabi , rotein alterations require multiple 
amino acid substitutions and thus are not accessible 
through single base changes or even through all 

15 possible amino acid substitutions at any one residue. 

" The objective in both the surgical and gene- 
directed random mutagenesis approaches has been, 
however, tc analyze the effects of a variety of single 

20 substitution nutations, so that rules governing such 
substitutions could be developed (ULME3JJ. Progress 
has been greatly hampered by the extensive efforts- 
involved in using either method and the practical 
limitations on the number of colonies that can be 

25 inspected (RO0E36) . 

The term "saturation r.u tager.es i s" vith reference 
to synthetic DMA is generally taken to -can generation 
of a population in which: a) every possible single-base 

30 change within a fragment of a gene of DMA regulatory 
region is represented, and bj most mutant genes contain 
only one mutation. Thus a set of all possible single 
mutations for a 6 base pair length of DNA comprises a 
population of 13 mutants. Oliphant ?.t a_U (OLIPB6) and 

35 Oliphant and Gttuhl (0LIP37) have demonstrated ligation 
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Several researchers have des igned an.i synthesized 
proteins de novo . Those designed proteins are 5.T.3U 
and nost have been synthesized in vi_urc as ?o lypept i'ies 
rather than genetically. Gutte ar.d colleagues r.ave 
made a polypeptide that binds CO' in 5:^ ethanol 
(HOSE33). Recently Moser ct <lU (M05t37) reported 
genetic expression in E_, cc_U both cf the der.iqr.ecJ 24 
residue DOT-bindinq protein and of fusions c: the DDT- 
bindinq sequence to LacZ . They state that design of 
biologically active proteins is currently lap.issible. 

Ericsson ct a_U <E?.:C86) have designed and 
synthesized a series of prate ins that they have named 
bctftbeUins, that are ccant to have beta sheets. They 



and cloning of highly degenerate oligonucleotides and 

h-jve applied saturation mutagenesis to the study of r 

promoter sequence and function. They have suggested ^ 

that similar methods could be used to study genetic j- 

expression of protein coding regions of genes, but they f 

do not say how one should: a) choose protein residues \ 

to vary, or b} select or screen nutants wich desir.Vcle [. ' 

properties . F 

i 

r 
\ 

Reidhaar-Olscn and Sauer (KEI038} have used fc. 

synthetic degenerate oligo-nts to vary simultaneously [; 

two or three residues through all twenty anino acids in r 

the dizer interface of cl repressor fron bacteriophage f, 

lambda. They give no discussion of the Units on how ^ 

nany residues could be varied at once n=r do they |' 

mention the problem of unequal abundjr.ee of DMA y 

encoding different anino acids. They looked tor \! 

proteins that either had vild-type dinerUatlon or that j, 

did not dimcrize. They did not seek prcteins having j. 

novel binding properties and did not fir.d any. \ 
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suggest use of polypeptide synthesis with mixed 
reagents co produce several hundred analogous 
bctabeilins. They suggest the mixture be passed over a 
column to recover the analogues vith high affinity for 
a chosen target compound bound to the column. They 
envision successive rounds of mixed synthesis of 
variant proteins and purification by specific binding. 
They do not discuss hew residues should be cnosen for 
variation. because proteins can net be amplified, the 
researchers nust sequence the recovered protein to 
learn which subst itutior.s improve binding. The 
researchers must limit the level of diversity so that 
each variety of protein will be present in sufficient 
quantity for the isolated fraction to be sequenced. 

A number of methods have been developed to 
separate cells through their affinity to various 
substances. Bonnafous e_c a_Li (SCr.'::3 = ) review methods 
that have been applied to ani.-al ceils, and cite two 
ccr.-cn problems : a) non-specific interact ions between 
cells and affinity supports, and b) irreversible 
binding, of' cells to affinity matrices.. Possible 
reasons for irreversible bindinq include multiple 
points "of attachment and very high affinity between 
cells and antibodies used as affinity materials, 
"chromatographic separation of animal cells is still 
difficult because of their fragility. Bacterial cells, 
bacterial spores, and some tactor icpr.aqe , however, arc 
sturdier than animal cells and have been fractionated 
based on proteins displayed on their surfaces. 



Ferenci and collaborators have published a series 
of papers cn the chromatographic isolation of mutants 
of the maltose-transport protein Lar.3 of Lt. PUli 
35 <W,\rin?9, FKKE303. FERF.SOb, FEREBOc, FE. n .£32a. FEP.ES2b, 
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CL-JM34, FE?.E36a, r'LP.r.a^o, ?F.P.tZ6c P ?T.nES7fi. 
FER£0 ~b , HEIN87, and H-.tNCS). The papers report Chat 
spontaneous and induced r.utants at the lanfi genetic 
locus can be isolated by chroma tog raphy over a column 
supporting immobilized saltose, r.a 1 todext r ins , or 
starch, i.e. carbohydrates that could be tr.ctabol ized by 
the bacteria. The reports speculate that other 
applications arc possible, but specifically mention 
only the elucidation of the residues responsible for 
the selectivity of the no i toduxtr in pore or similar 
pore proteins. 

Ferenci's experiments measured the combination of 
the individual affinity of mutant Larr.3 molecules and 
the level of expression. Several classes of mutants in 
lam3 were isolated. On? class had higher affinities 
for both- maltose and starch, one class had lever 
affinity for starch but higher affinity for aaltose. 
and another class had higher affinity tor starch but- 
lower affinity for maltose. 

Mutants vere generated eirhor by hydroxy! acine 
treatment 'of a plasmid carrying the entire tjens, or by 
insertions of tvo extra codor.s at natural H£* U sites. 
Levels of mutagenesis were picked to provide ringin 
point nutations or single insertions of tvo l-os .vd ;es . 
No multiple nutations vere sought or found. 

LanB is a large trineric integral r.er.brane 
protein; such proteins are very difficult to 
crystallize or even to solubilizc. Therefore it is 
difficult to use single-crystal protein X-ray 
crystallography or NMR to obtain detailed Z0 structural 
information. Caravito ct a_U <CASA33) have obtained 
crystals of Lar.n that diffract X-rays, but the 3D 
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•-•on do te r:* i ned . 
•jC include the 



ctrucrun: of the protein has not y 
There Arc .-nr.-i»lc (GEKR3 7, KCTMSn) 
secondary structure of Lar.B, i.e. they specify which 
residues are in beta-sheet conf orna t ion , which residues 
are in turn:;, and which residues are cn the outside, in 
the periplasm, or in the membrane. These r.odels do not 
specify how the beta shoots are arranged nor which 
turns are close to which other turns. ^ 

Fercnci and Lee (FESEBfia) reported on the 
tenperature sensitivity of carbohydrate binding in LL_ 
r.*:c*roth crpoaililus. At higher temperatures. the 

organism breaks down the pol/saccha r i'Je . the binding of 
which was the object of the study. Clune, Lee, and 
Fercnci (CLli:m<) reported that presence of complete 0- 
antigen affected the binding properties of la-a on the 
surface of fU. colt ■ Both of these reports point up the 
difficulties of working with live bacteria thdt can 
nx-taboiize chemicals and change tr.eir physiological 
behavior during the chromatographic e>:p-e r ir.fcnt . Heine 
oc al . (HE INC 3) have used the che.T;3«:axis of £e_Lk 
recently to isolate mutant? in _i_ajrK :r.r» arc unaffected 
in chenotaxis: thi? approach is 1 i- i ted to netasclites 
that affect cnecotaxis. 
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(MAKFSO) revie-ci r.ethscs 

to bacteriophage 



Make la et 
involve cheir.icol ly coupL ing ant igtns 

quantitative detect io:. system 
re'/ic-cd exploit the 
si'jced by antibodies 



to produce a sensitive, 
for antibodies. The method 
ability to amplify the signal prci'j 
binding to the antigens coupled to t.-.e phage, through 
growth of the phage. The antigens -cr 
phage chemically and not oncoded 
phage. Tlr.is there was no sorting c£ genetic r.aterial. 
Furthermore, the objectives of the methods reviewed 
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involve titering the phage that fail to bind, as an 
assay of antibody. The methods of the present 

invention, in nost cases, involve growth and 
amplification of genetic packages that bind with high 
affinity. 

In 1935 Smith (SMIT85) reported inserting a 
heterologous gene into gene HI of bacteriophage fl. 
The gene III protein is a minor coat protein necessary 
for infectivity. ' In some cases the inserted gene 
preserved the original reading frame, leading to 
expression of heterologous protein as an inserted 
domain in the gene III protein. Smith demonstrated 
that' the resulting strain of fl vivicn are. adsorbed by 
antibody against the protein encoded by the 
heterologous DNA. The antibody vas bound to a 
polystyrene petri dish. The phage vere eluded at ptl 
2.2 and retained seme infectivity. However, tne sinqle 
copy of fl gene 111 -as used for insertion cf the 
heterologous gene so that all copies of gene III 
protein vere affected; infectivity of the resultant 
phage was rccucad 25-Coid. Smith also demonstrated 
that batch elution fro a a plate can separate fl virions 
that differ by only a few protein domains on their 
surfaces. 



Smith presented his method as a vay to isolate 
cloned genes using antibodies to the gene products. He 
made no mention of mutagen i z ing the inserted genetic 
30 material or of inducing novel binding properties in the 
inserted protein domain. 

De la Cruz et a_U (CRU288) have expressed a 
fragment of the repeat region of the circurr.sporozo i te 
35 protein fron P lasmodium falciparum on the surface of 
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K13 as. an insert in the gene III protein. They shoved 
that the recombinant phage were both antigenic and 
Ununogenic in rabbits, and that such recombinant phage 
could be used for B epitope mopping. The researchers 
5 suggest that sir.ilar recombinant phaqe could be used 
for T epitope napping and tor vaccine deve lepr.ent . 
They do not suggest mutagenesis of the inserted 
r.ateria 1 . 

10 Cene fragments coding for portions of hepatitis 6 

virus antigens' have been fused to fragments of lanB. 
If the point of fusion is in a region coding tor 
exposed domains of Lame, the HDV antigens appear cn the 
•cell surface ar.d are immunogenic (CILAR37). Charbit £t 

13 al^ (C11AR37) suggest use of these engineered strains 
for development oi a live bacterial vaccine; they have 
not reported interest in aut^cnes is of the f-jsed 
heterologous gene fragments, nor in cev*lcp=er.t of 
binding capabilities. 

:o 

Recently T;ian and colleagues (KGDA3S, 33IC57, and 
JO:;E37) have sh=vn that DNA of definite sequence bound 
to* an affinity «lc:.n can be used to purify proteins 
that bind the CNA sequence-spec i f ica L ly . The prcteins 
25 are purified as much as 10C0-:oid in two 
chrcrr.atograchic steps or SJ-folc in a single st^p. 

Patents And patent applications vhich -ay be cf 
interest incite US Patent No. 4.704.692. "Ccr.outcr 

30 DasoJ Systea and Method fcr Dete reining and Displaying 
Possible Chenicai Structures for Converting Co-=le- or 
Multiple-Chain Polypeptides to Single-Chain 
Polypeptides" .(Ladner '692). issued to Robert Charles 
Ladncr on 3 ::ovecter 1937 and assigned to Ccne/. 

35 Corporation, r.ainec '*92 describes a design nethod for 
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converting proteins composed cf t-jo or nore chains into 
proteins of ' fewer polypeptide chains, but with 
essentially the sane 3D structure.. There is no 

mention of variegated ONA and no genetic selection. 

Hobert Charles Ladncr also has, six pater.t 
applications pending before the USPTO and assiqrwd to 
Genex Corporation: 

07/92, 110 
07/21,046 
07/21,047 
07/34,964 
07/34,965 
07/34,966 

Sonia K . Cute man is n?.ned as a joint inventor on 
US patent No. 4,745,056 ( "3 1 roptcr.yces Secretion 
Vector") and on Ser. Wo. 21,465. 

None of the Ladncr or Cuterr.an patents or 
applications is believed to disclose or suggest the 
present invention, but it is requested that each be 
considered by the Examiner. 

No admission is made that any cited reforer.ee is 
prior art or pertinent prior art, and the djtes givon 
are those appearing cn the reference and nay not be 
identical to the actual publication date. 

SUMMARY OK THK INVENTION 

This invention relates to the construction, 
expression, and selection of nutated genes that specify 
novel proteins with desirable binding properties, as 
well as these proteins themselves. The substances 
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bound by there proteins, hereinafter referred tc as j 
"targets", .-ay be, but need not be, proteins. Targets 
nay include other biological or synthetic 
raacronoleci:les to well as organic and inorganic 

co i ecu I es. £'.': 

I- - 



10 



* * * i 



The novel binding proteins rr.ay be obtained: i) by r ; 



nutating a gene encoding a known binding protein within 
the subsequence encoding a known binding domain, or 2) 
by taking such a subsequence of the gene for a first 
protein and ccr-bining it with all or part of a gene for ^ 
a second prctein (which rr.ay or may not be itself a £ 
known binding prctein). 3) by nutating a uene encoding 
a protein which, while not possessing a known binding £ 
15 activity, possesses a secondary or higher structure fef 
that lends itscl: to binding activity (clefts, grooves. 
eZC . } t or by nutating a gene encoding a kn-svn W 

binding protein our. not in the subsequence knevn to 
cause trie .binding. The protein fro- vhicn the novel \j 
20 binding protein is derived, need not have any specific 

finity- for the target -ate rial. * j-;; 



In one er. l:^d i:r.en t , tlie invention relates tc: 

25 a) preparing * varieqated population of rcplicable 

ger.etic p^.ckaues, each package including a nucleic 
" acid c:.r;::uc: ceding on expression for an outer- ^ 
surface-cisplaycd potential binding protein 
comprising (i) a structural signal directing the 

30 display of trie protein on the outer surface of thu 

package and (ii) a potential binding domain for 
binding said target, where a plurality of £ 
different potential binding dona ins are displayed £ 
by the individual packaqos; 
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The invert icr. further relates to a -ethos" of 
preparing a r.ixed peculation of repli.cabie genot Lc 
packages in- vhich each package includes a gene 
expressing a potential t» ind i. ng protein in such a -anr.er 
that the 'protein is presented on the outer surface of 
tho package. This r.et hod conpr:?cs: 

i) preparing a variegated population o? CliA 
inserts of each of which comprises a first 
sequence vhich codes on expression cor a potential 
binding dorcain and, a second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replicable genetic package, ^ " - 

ii) incorporating the resulting popui^ricn c t CNA 
constructs into the chosen replicable genetic 
packages to produce a population of reclicebic 
genetic packages. 



In a preferred er.bodir.ent , the octant iai-= ir^ing- 
prcte in-encodir.g inserts are incorporated into 
encoding an cu:er-si:r : ace protein of t 
genetic package. 



reolicablc- 



The invention er.ucr.passes the design an= synthesis: 
of variegated DNA encoding a fa.-ily or potential 
bindinq proteins characterized by constant and variable 
regions, ' said proteins being designed vith a view 
tovard obtaining a protein that binds a pr* ietemincd 
target. 

For the purposes of this invention, the tern 
"potential binding protein" refers to a protc in ■ encoded 
by one species of. DNA -olccule in a population of 
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The invention further re U::s to a ceth-;d of. 
preparing a r.ixed peculation of repli.cabie genetic 
packages in which each package includes a gone 
expressing a potential binding protein in such a -anner 
that the protein is presented on the outer surface of 
the package. This r.ethcd con?r::.cc: 

ij preparing a variegated papulation of CNA 
inserts of each of which comprises a first 
sequence which codes on expression tor a potential 
binding donain and. a second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer s-rface of a chnsen 
replicable genetic package, a:..: 

ii) incorporating the resulting populir.icn c: CNA 
constructs into the chosen replic-ble genetic 
packages to produce a population of repUceblc 
genetic packages. 



In a preferred er.codir.cnc, the octant i 2 : -= i r.i ir.-j- 
prctein-encodir.g inserts are incorporated into a gene 
encoding an cuter-surface protein of the replicable 
genetic package. 

25 - 

The invention er.jc.-pacses the design and synthesis 
of variegated DNA -needing a family of potential 
binding proteins character izeo by constant and variVole 
regions, said proteins being designed with a view 
30 toward obtaining a protein that 'binds a pre- tete rained 
target. 

For the purposes of this invention, the tern 
"potential binding protein" refers to a protein encoded 
35 by one species of CNA r.olccule in a population of 
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proteins that in fact bind to the target { "success fu L 
binding domains"). Atter one or more rounds of such 
enrichment, one or nore of the chosen genes are 
examined and liequcncid . If desired, new Loci cf 
variation are chosen. The selected daughter genes of 
one generation then become the parent sequences for the 
next generation of va r iog j f.o-J DtJA , beginning the next 
"variegation cycle." Such cycles are continued until a 
protein with the desired target affinity is obtained. 

The appended claims are hereby incorporated by 
reference into this specification as an enumeration of 
the preferred em bod inents . 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic shoving the relationships 
between various types of Binding Domains (BD). 

Figure 2 is a flow chart showing the major steps urcd 
to create a novel protein with affinity for a pre- 
determined target. 

Figure 3 is a stereo view ot" a molecular model of the 
coat of the bacter ienhage fl. 

Figure 4 is a schematic ot a PCD contacting a molecule 
of target material- 

Figure 5 is a stereo view of a hypothetical interaction 
between B?TI and myoglobin. 

Figure 6 is a schematic cf tho binding surface cf a P13D 
at various stages in the process of selecting a 
successful binding dor.ain for a hypothetical target. 



proteins that in fact bind to the target ("successful 
binding domains"). Atter one or more rounds of such 
enrichment, one or norc of the chosen genes are 
examined and sequenced. If desired. new loci cf 
5 variation are chosen. The -.-.elected daughter genes of 
one generation then become the parent sequences for the 
r.ext generation of va r icq j to -J DUA, beginning the next 
"variegation cycle." Such cycles are continued until a 
protein with the desired target affinity is obtained. 

10 

The appended claims are hereby incorporated by 
reference into this specification as an enumeration of 
the preferred embod inen ts . 

is brief UF.:;c;uPTicw of the nt:Aw;ncs 

Figure 1 is a schematic showing the relationships 
between various types of Binding Domains (BD). 

20 Figure 2 is a flow chart showing the major steps ured 
to create a novel protein with affinity for a pre- 
determined target. 

Figure 3 is a stereo view of a molecular model of the 
25 -coat of the bacteriophage t'l. 

Figure 4 is a schematic ot a PCD contacting a molecule 
of target material. 

30 Figure 5 is a stereo view of a hypothetical interaction 
between BPTI and myoglobin. 

Figure 6 is a schematic of the bindinq surface cf a PUD 
at various stages in the process of selecting a 
35 successful binding domain for a hypothetical target. 
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Eaccerial Cells as Genetic Package:; 
Preferred f:actcr ia 1 Cells for Use as CPs 
Preferred Ou:cr Surface Proteins for 
Displaying I r CDs on bacterial Cells 
Choice of Insertion site for . IP3D in 
Dacteri-il Cell OSP 

In Vivo ^eleccicn for Pseudo OSP Cene tcor. 
P.airJ-j.-a 01 1 A Inserts in Bacterial Ceil?. 

Displaying IPSO on bacterial spores 
Preferred Bacteria 1 . Spores for Use as CPs 
Preferred Outer-Surface Proteins for 
Displaying IPSO on B.icterial Spores 
Choice of Insertion site for IPDD in OS? 
In Vivo Selection for Pseudo OSP Gene trc= 
i QUA Inserts in Eaccericl Spores 



Displaying IPBD on Outer Surface of Phages 

Preferred Phages fcr <ise as CPs 

Preferred CSPs for Displaying IPGOs on W.agoc 

Choice of Insertion site for IPBO in OS? 

In Vivo ^ l^-tlen for Pseudo-OS? Cene i r:r. 

Rrtndc.i L;::A I -sorts in Pnages 

Choice of Ir2D 

Inr'luoncc of t.irget size on choice of IrZO 
Influence: of target charge cn choice cf iri'.r: 
Other c^nr. iccrat ions in the choice of IPfcO 

Choice of CCV 

He-signing the c^o-. i nM gene Insert 
Cone tic regulation of the osp-i nh'i gene 
SNA sequence design 
specific D::a sequence assignnent 
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i . 1 . 1 
1.1.2 

1.1.3 

I . 1 .1 



1.2 

1.2, 
1.2. 

1.2, 
1.2, 



Bac-crial Cells as Ccnctic Pacfcaqo-j 
Preferred f:acto r ia 1 Cells for Use as CPs 
Preferred Outer Surface Proteins for 
Displaying i:-BDs on iiacterial Cells 
Choice of Insertion site for . IPBD in 
Dacteri-il Cell OSF 

In vivo SnLecticn tor Pseudo 05 P Gene i ror. 
P.ur.-J-j* DNA Lnserts in Dae te rial Ceiir. 

Displaying IPSO on bacterial spores 

Preferred Sacte ria 1 . Spores for Use as CPs 

Preferred Outer-Surface Proteins fcr 

Displaying I?BD on Bacterial Spores 

Choice of Insertion site for ITBD in OS? 

In Vivo Selection for Pseudo OSP Ceno frc= 

Random DMA Inserts in Bacterial Spores 
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1.1 

1.2 
1.3 



Displaying I^BD on Outer Surface of Phages 

Preferred Phages fcr 'J^e as CPs 

Preferred cr:-?s fcr Displaying IPDDs on W.agec 

Choice of Insertion cite for imo in CSP 

• •I Vivo -V? \n~z len for Pseudo-OS? Gene f czz 

Random L>::a I -sorts in Pnages 

C'r.o iza of It- 2D 

Inrl.ionce of target size on choice of I ?2'J 
Influence-, oi target charge on choice of iriiC 
Other c-nr. idorot ions in the choice of I?hD 

Choice of CCV 

fit-signing the c^c - i nM gene Insert 
Genetic regulation of the oso-i ph^j gene- 
SNA sequence design 
Specific D:.'A sequence assignment 



13. 1.2 The Secondary :;ct 

13.1.3 Cncice of Kg:; id uo s to Vary Initially 

13.2 Choosing range of variation 

13.3 Design of vg DNA Encoding l'. l *D Family 

14.1 Insertion of synthetic vgDNA into plasmids 

14.2 Transformation of cells ■ 

14.3 * Crovth of the CP{ vgpOD) population 

15.' Isolation of CPfSEiDJs with b ind ing-to-ta rgnt 

phenotyper; 

15.1 Attaching the target material to a column 

15.2 Reducing selection due to non-specific 
binding 

15.3 Eluting the column 

15. 4 Recovery of packages 

15. 5 Amplifying the enriched packages 

15.6 Determining whether further onrienment is 
needed 

15.7 Characterising population 

15.8 Testing of D i nd i ng _ affinity 

15.9 Other Affinity Separation ."leans 

16.0 The Cext Variegation Cycie 

17.0 OTHER CONS IDEATIONS 

17.1 Joint selections 

17.2 Selection cor non-binding 

17.3 Selection of P3Ds for retention of structure 

17.4 Created binding proteins not unique 

17.5 . Other nodes of autwiqer.es is possible 
Example 1 Derivation of Novel 3indinq Protein lor 

Myoglobin Using BPTI as IPBO, Ml 3 as CP, -irvJ 
the Gene vir: Protein as OSP. 



L 3 - 1 - 2 The Secondary .*Jct 

13.1.3 One ice of Residues to Vary Initially 

12.2 Choosing ronqc of variation 

13.3 Design of vg DNA Encoding I'MD Family 

14.1 Insertion of synthetic vgDNA into plasmids 

14.2 Transformation of colls 

14.3 • Crowth of the GP(vgPBD) population 

15 . Isolation of CPfSUDJs with b ind inq-to-ta rg^t 

phenotypes 

15.1 Attaching the target material to a column 

15. 2 Reducing selection due to non-specific 
b ind ing 

15.3 Eluting the column 
15./* Recovery of packages 

IS. 5 Ar.pl if y ing the enriched packages 

15.6 Determining whether further er.ricnment in 
needed 

15.7 ' Character i .: i.nq population 

15.8 Testing of binding affinity 

15.9 Other Affinity Separation Means 

16.0 The Next variegation Cycle 

17.0 OTHER CONSIDERATIONS 

17.1 Joint selections 

17.2 Select ion for r.on-b ind ing 

17.3 Selection of P3Ds for retention of structure 

17.4 Created binding proteins not uniqvr* 

17.5 Other modes of mutagenesis possible 
Example 1 Derivation of Novel Qindinq Protein lor 

Myoglobin Using BPTI as IPBD, M13 as CP, -irvJ 
the Cone VIII Protein as OSP. 



bind a chosen target, it is referred to heroin as a 
"binding domain" ( BD) . A preliminary operation is to 
engineer 'the appearance of a stable protein domain, 
denoted as an "initial potential binding"' domain" 
5 (IPDD), on the surface cf a genetic package. The 
present invention is concerned with the expression at 
numerous, diverse, variant "potential binding uonains" 
(PBD) , all related to a "parental potential binding 
domain" (PPBO) such as the bindinq domain of a known 

10 binding protein, and with selection and amplification 
of the genes encoding the most successful rr.utant PODs. 
An IPDO is chosen as PP9D to the tirst round of 
variegation. Sc lect ion-through-bi nd ing isolates one or 
more "successful binding doma ir.s" • ( SGD) - An SBD fro.-?. 

15 one round' of variegation and se lect ion-through-b ind i r.g 
is chosen to be the PPE?D for the next round. Tne 
invention is ice, however, limited to prcteir.s with ft 
single UO since the method rr.ay be applied to any sr ail 
of the 30s of the protein, sequentially cr 

20 simultaneously. The relationships of the various 3C-J 
are illustrated in Figure 1. 

Conventionally, DMA sequences am •-ritt--n from I' 
to' 3', left-to-right showing cniy the sequence th-\t 
2 5 will appear as nRNA (with each T of DMA changed to U in 
mRNA) . 

protein: M - u - F - 

30 anti-sense DMA: 5' ATG CTT TTC ... 3' 

sense DMA: 3' TAC CAA AAG ... 5' 

mRMA : 5 ' AUC CL"J UUC ... 3 ' 

The complementary strand is the one used as tcr.piate 
for m.P.I.'A synthesis and so is called the "sense strand"; 
we will use this convention throughout. Although this 
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bind a chosen target, it is referred to herein as a 
"binding domain" (BD) . A preliminary operation is to 
engineer 'the appearance of a stable protein domain, 
denoted as an "initial potential binding domain" 
(IP00), on the surface of * genetic package. The 
present invention is concerned v-tn the expression of 
numerous, diverse, variant "potential binding domains" 
(PBD), all related to a "parental potential binding 
domain" (PPBDJ such as the binding domain of a known 
binding protein, and with selection and amplification 
of the genes encoding the r-ost successful -utant POCs. 
An IP00 is chosen as PP90 to the tirst round of 
variegation. Sc lect ion-through-bi nd ing isolates one or 
more "successful binding domains" (SSHj An SBD fro= 
one round' of variegation and selection-through-binding 
is chosen to be the PPBD for the ncx': round. The 
invention is not, however, limited to proteins vith a 
single UO since the method may be applied to any sr ail 
of the DOS of the protein, sequentially cr 
simultaneously. The relationships of the various 3Cs 
are illustrated in rigure I. 



Conventionally, 0!IA sequences a: 



-ritt-r.n from t 



to' 3', left-to-right shoving enly the sequence th-*t 
25 will appear as nRNA (vith each T of DN A charged to U in 
rr.RNA) . 

protein: M - L - F - 

30 anti-sense DMA : 5' ATC CTT TTC ... 3' 

sense DNA: 3' TAC CAA AAG . . - 3 

mRNA: 5' AUG CC'J UUC ... 3' 

" The coiP.plocont.iry strand is the one used as template 
for mRKA synthesis and so is called the -sense strand" 
we will use this convention throughout. Although th:: 
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the analyte can be freed frc;= the affinity naterial 
once the icpurities arc- washed avay. 

Affinity colur.n chro-a tography involves chemically 
attaching tao affinity r.atcrial to -an inert solid 
support ratrix that is held in a ecluan so that 
solutions can be passed over the matrix in a controlled 
way. Mixtures that sight ccr.tain the analyte are 
passed over the natrix to which any analyte conpcr.ent 
in the nixture adheres. Separation is achieved by 
passing a gradient o: sc-e type over the natrix and 
collecting fractions. It is' also possible to recover 
purified material fr;r. the r.atrix by other reans after 
impurities have been -Mshcd avay. 

An alternative to colu-n affinity chrona tography 
is batch elution fr?= er. affinity r.atrix caterial held 
in sone container. Affinity material is chemically 
bound to the -atrix. A nixture that night contain tne 
analyte is aided a.-j the -•■Jtrix is rinsed vith buffer. 
The material is rir.sed with a series of buffers 
containing increasing concentrations of solutes cheser. 
to wash i -purities avay. The analyte is recovered in 
purified fern either in cne cf the buffer fractions or 
bound to the matrix. 

Another alternative to cciu-n affinity chrona teg - 
raphy is zatch elution fror. a plate. The affinity 
material can be ch&rically tound to a flat surface, 
such as the cotter, c: a polystyrene petri dish. .-. 
fixture that night ccr.tain the analyte is aaded to the 
plate and the plate is rir.sed with a buffer. 
Subsequently, the plate is washed with a series oi 
buffers containing increasing cencentra t ions of solutes 
chosen to separate co.-penents having lower affinity for 
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the analyze can be freed from the affinity aateriaL 
once the impurities are cashed avay. 

Affinity colur.n chromatography involves chemically 
5 attaching tno affinity material to ^ inert solid 
support matrix that is held in a column so that 
solutions C3n be passed over the matrix in a controlled 
way. Mixtures that night contain the analyte are 
passed over the matrix to which any analyte conpcr.ent 
10 in the oixture adheres. Separation is achieved toy 
passing a gradient of seme type over the matrix and 
collecting fractions. It is also possible to recover 
purified material frtm the matrix by other means after 
impurities have been vjsho.i avay. 

15 

An alternative to column affinity chromatography 
is batch elution frcm ar. affinity matrix material held 
in sons container. Affinity r.aterial is chemically 
bound to the matrix. A mixture that night contain the 

20 analyte is added a.-.u the mitrix is rinsed vith buffer. 
The material is rinsed with a series of buffers 
containing increasing concentrations of soiutes chese- 
* to wash impurities avay. The anaiyte is recovered in 
purified fcrm either ir. cne cf the buffer fractions or 

2 5 bound to the matrix. 

Another alternative to column affinity chronatc;- 
raphy is catch clution fro- a plate. Tho affinity 
material can be chtrically bound to a flat surface. 

10 such as the cotter c : a polystyrene petri dish. A 
fixture that might ccr.tain the analyte is aaded to the 
plate and the plate is rinsed vith a buffer. 
Subsequently, the plate is vashed with a scries of 
buffers containing increasing cencentrat ions of solutes 

3S chosen to separate co.-pcr.ents having lowor affinity fcr 
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or cells. It has been used to separate bacteriophages 
on the basis' of charge. ( SERwa 7 ) . 



The present invention makes use of affinity 
5 separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population fcr 
those cells or viruses carrying genes that code for 
proteins with desirable binding properties. 

10 In the present invention, the words "grow", 

"growth'* f "culture", and "amplification" r.ean increase 
in number, not increase in size of individual cells or 
phage. In the present invention, the words "select" 
and "selection" are used in the genetic sense; i.e. a 

15 biological process whereby a phenotypic characteristic 
is used to enrich a population for those organises 
displaying the desired phenotype. Choices or elections 
to be nMde by humans are indicated by "choose", "pick", 
" take" , etc. , but not "select". 

20 

The process of the present invention comprises 
three major parts: 



I. design and production of a replicible 
25 genetic package (CP) that displays an IPSO on 

the surface of the CP; the combination is 
denoted GP(IPED) , 

II. desiqn and irr.plomcn ta t ion of an affinity 
30 separation process that separates G?(IPBD)s 

that bind to a known affinity molecule frcm 
wild-type GPs or CP(IP0D")s, neither of which 
binds the known affinity molecule, and 



or 
on 



cells. It has been 
the basis of charge. 



used to separate 
(SERW37) . 



bac car iophages 



The present invention makes use of affinity 
separation of bacterial cells, or bacterial viruses (or 
other qonetic packages) to enrich a population for 
those ceils or viruses carrying genes that code foe 
proteins with desirable binding properties. 

In the present invention, the words "grow", 
"growth", "culture", and "amplification" r.ean increase 
in number, not increase in size of individual cells or 
phage. In the present invention, the words "select" 
and "selection" are used in the genetic ser.se; i^ a 
biological process whereby a phenotypic characteristic 
is "used to enrich a population for those organises 
displaying the desired phenotype. Choices or elections 
to he made by humans are indicated by "choose", "pic>:'\ 
"take", etc., but not "select". 

The process of the present invention comprises 
three major parts: 

I. design and production or a rcplicable 
genetic package (CP) that displays an IPSO on 
the surface of the CP; the combination is 
denoted CPflPEO) , 

II. design and ir.plencnr.at ion of an affinity 
separation process that separates C?C?BD)s 
that bind to a known affinity molecule from 
wild-type CPs or CP<IP0D-)s, neither of which 
binds the known affinity nolecule. and 
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3) designing an ocino acid seq-jer.c: that: a) 
includes the IPED as a subsequence and b) will 
cause the IVQO to appear cn the CP surface (Sees. 
1.1.2, 1.2.2, 1.3-2, and 4 ) , 

4) engineering a gen*;, do noted Qr.iisH.Z2* thac: a) 
codes for the dc-signc-d ar.ir.o ac:d y.-quence, fa) 
provides the necessary genetic regulation, and c) 
introduces convenient sites for genetic 
manipulation (Sees. 4.1, 4.2, 4.1, 5.1. and 5.2), 

5) cloning the cso-irbd ger.e incc the CP (Sec. 
6.1), and 

6) harvesting the transferred CPs (Sec. 7) and 
testing then fcr presence of I P 3 D on the C? 
surface (Sec. 3); this test is performed '^ith in 
affinity r.olecule having high affinity for IP = D, 
denoted AfX( IPSO) . 



In another preferred e - "cod 1 rent , 
involves : 



P-irt : c: the process 



1) choosing a C? such 35 a bacterid L cell {Sec. 
1.1.1), bacterial spcre (1.2.1). or phage (1.3.1) 
having a suitable cuter surface protein (Sees. 
1.1.2, 1.2.2 and 1.3.2; , 

2) choosing a stable (Sec. 2), 

3) designing a D::a sequence that: a) encodes the 
IPDO as a subsequence and b) cenr^ins suitable 
restriction sites so that ror.dcn USA may be 
operably linked to the in'cd gone fragment; and c) 
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3) designing an acino acid sequcne:- that: a) 
includes the IPBD as a subsequence and b) will 
cause the ITDO to appear cn the CP surface (Sees. 
1.1.2, 1-2.2, l.:-2, <anJ <•) , 

4} engineering a gen-, dor.ctcd fv-iir.Li.S2. that: a) 
codes for the designed ani.-o ac:d •j-qMcnce, b) 
provides the necessary genetic regulation, and c) 
introduces convenient sites for genetic 
nanipulation (Sees. 4.1, 4.2, 4.1, « . 1 . and 5.2), 

5) cloning the oso-irhd ger.e int = the CP (Sec. 
6.1), and 

6) harvesting the transferred CPs (Sec. 7) and 
testing then fcr presence of IP5D on the C? 
surface (Sec. 3); this test is perfcrr.ed --uth in 
affinity r.olecuie having high affinity fcr £?S3 f 
denoted A f ,*•*.( IPSO) . 

In another preferrt-d e-bod i rent , IMrt : c: the process 
involves : 

1) choosing a C? such as * bacterial ceil (Stc. 
1.1.1), bacterial spore (1.3.. I), or phage (1.3. i) 
having a suitable cuter surface prstein (Sees. 
1.1.2, 1.2.2 and 1 . 3 . 2j , 



2) choosing a stable l?ZD (Sec. 



1') 



1) designing a o:.*A sequence that: a) encodes the 
1PD0 as a subsequence and b) ccp.c.iir.s suitable 
restriction sites so that racoon USA n^y be 
operably linked to the infcrf gone frag-ent; and c) 
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domain. References to PbD cr 
indicate a preparatory intent. 



L\d in Part r are to 



In Part II we optimize separation of CP(IPBO) frcs 
wild-type CP, denoted vtCP, based on the affinity of 
IPUD for AfM(II'BD). To cscicUsh the sensitivity nf 
the affinity separation process, we separate a -a 1 L 
amounts or* GP(IPBO) :*rom r.ucn larger anouncs of vtCP. 
In a preferred embodiment, P*rt II of the process of 
the present invention involves: 

1) preparing affinity colur.ns bearing AfM(IPBD) at 
various densities of A fM ( I ?30) / (volume of matrix), 
(Sec. 10.1), 

2) preparing CPCZPBDJs uith various amounts of 
IPQD per CP, 



3} picking a graoter 
columns (Sec . 10.1}, 



r.e for elutinc, -he 



A) determining which ccr.'c ir.a tion of: a J IPBD/C?. 
b) density of Af y. ( 1 :-'BD) / (volume of support), c) 
initial ionic strength, d) elution rate, and e) 
(accunc of CP) / ( vc lur.c of support) loaded, givos 
the best separation of C?(IPBD) free vtCP (Sec. 
10. 1) , 

5) determining the smallest ar.ount of CP(I?bO) 
that can be isolated from a r.uch larger amount of 
wcGP using the optir.al condition, (Sec. 10.2), and 

6) determining the efficiancy of the affinity 
separation procedure (Sec. 10.3). 
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domain. References to Pbn cr red in Pare I are to 
indicate a preparatory intent. 

In Part II we optimize separation of C?(IPBO) from 
wild-type CP, denoted vtCP. based on the affinity of 
I PUD for AfM(IPDO). To establish the sensitivity of 
the affinity separation process, we separate small 
amounts of CP (I PBD) from r.ucn larger anounts of vtCP. 
In a preferred embod iment , P*rt II of the process of 
the present invention involves: 

1) preparing affinity columns bearing Af M ( I PBD) at 
various densities of Afm* f I ?3D) / (volume of matrix). 
(Sec. 10.1), 

2) preparing CPflPSDJs with various amounts of 
IPDO per CP, 

3) picking a gradient regime fcr o luting tr-.e 
colucns (Sec. 10.1) , 

4) determining which cc-b i nation of: a) I PBD/ CP. 
b) density of Af M ( I .- , BC) / (volume of support), cj 
initial ionic strength, d) elutinn rate, ar.d e) 
(amount of CP)/ (vclur.e of support) loaded, gives 
the best separation cf CPCPBO) frcr. utCP (Sec. 
10.1), 

5) determining the smallest amount of CP(IPbO) 
that can be isolated from & much larger amount of 
wcCP using the optimal condition. (Sec. 10.2), and 

6) determining the officisncy of the affinity 
separation procedure (Sec. 10.3). 



} ; picking a sec of :;everai residues in the pp^o 
co vary; the principal indicators of which 
residues to vary inc; I u'ie : a) the 3D structure of 
the TPBD, b) sequences of homologous proteins, and 
c) computer or theoretical nodding ihat indicates 
which residues can tolerate different amino acids 
ui'.nout disrupting Che underlying structure ('Joe. 
I : - I ) , 

4j picking a subset of Che residues picked in Part 
III. 3, to be varied simultaneously (Sec. 13.1); 
the principal considerations are Che number of 
different variants and vhich variants are within 
the detection capabilities of the affinity 
seoaration determined in Part II. and setting the 
range of variation (Soc. 13.2); 

D } ir.p 1 er.cn t ing the variegation by: 

a) synthesizing the part of the osn- pM <;er.e 
that encodes the residues to be varied using a 
specific nixture of r.uc 1 cot i^e substrates for 
some or all of tho bases c-nc-dinq residues 
slated for variation, thereby creating a 
population of D:.'A r.olecules, denoted vgDMA 
(Gee. 13.3) , 

b) ligating this vji;.\V\ f by standard tnothods, 
into the operative cloning vector (OC) (e.g. 
a plasmid or bacteriophage) (Sec. 14.1), 

c) usir.q the lighted DNA to transform cells, 
thereby producing a population of transformed 
colls (Soc. 14.2) , 



i) picking a sec of several residues in the PTHO 
to vary: the principal indicators of vhich 
residues Co vary include: a) the 3D structure of 
the IPBO, b) sequences of homologous proteins, and 
c) computer or theoretical nodding that indicates 
•-fhich residues can tolerate different anino acids 
witr.out disrupting the underlying structure {'Sec. 
i : . : ) . 

4j picking a subset of the residues picked in Part 
III. 3, to be varied simultaneously (Sec. 13.1); 
tne principal considerations are the number of 
different variants and vhich variants are within 
the detection capjbi 1 itics of the affinity 
separation determined in Part II. and setting the 
range of variation (Sec. 13.2); 

D) ir.p L c.T.cn t ing the variegation by: 

a) synches i ring chc part of the oyn-j}^ t;er.e 
that enco-Jes the residues to bo varied using a 
specific nixture of r.ucleoti^e substrates for 
some or all of the bases encoding residues 
slated for variation, thereby creating a 
population of D::a r.olecules. denoted vgDNA 
(5ec. 13-3), 

b) Ugating this vjCrA. by standard methods, 
in'-io the operative cloning vector (OCV) ( c.q. 
a plasnid or bacteriophage) (Sec. 14.1). 

c) uoir.g the lighted DNA to transform cells, 
thereby producing a population of transformed 
cells (Sec. 14 .2) . 
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Abbreviation 



CP 



vtGP 



IPBD 



PSD 



SBD 



PPBO 



05 P 



OSP-PBD 



OSTS 
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Mean ing 



Cenetic Package, e.g. a 
bacteriophage 

wild-type GP 

Any protein 

The gene for protein X 

Initial Potential Binding 
Domain, e.n. BPTI 



Potential Binding Domain, o. c . 
a derivative of BPTI 

Successful Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to a 
target 

Parental Potential Binding 
Dorr.ain, i.e. an IPBO or an SBD 
frcn a previous selection 



Outer Surface Protein, 
coat protein of a phage or 
LamB from col i 

Fusion of an OSP and a ?SD, 
order of fusion not specified 

Cuter Surface Transport Signal 
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CP(x) 
C?(X) 

CP( osp-pbd ) 
GP(OSP-PBO) 

G?f pbd ) 
CP(P3D) 



AfM(W) 
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A genetic package containing 
the x_ gene 

A genetic package that 
displays X on its outer 
surface 

CP containing an osp-obd gene 

A genetic package that 
displays PBD on its outside as 
a fusion to OSP 

C? containing a pbd gene, osp. 
implicit 

A genetic package displaying 
P30 on its outside, OS? 
unspec i f ied 

An affinity -atrix supporting 
"Q" , e.g. (T4 lysozysci is T-". 
lysozyze attached to an 
affinity catrix 

A r.oleculc having affinity for 
"W", e^ trypsin is an 
AfM(BPTl) 

AfM(W) carrying a label, e_._a.% 
125 t • 

A chemical that can induce 
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expression of a gene, e.g. 
I PTC for tr.e lac'JVS promoter 

Operative Cloning Vector 

K T = :?} ;32C;/:T:SUD] (T is a 
target) 

K N « [M] rsP0l/[N:SB0J (N is a 
non-target; 

Density of AfM(W) on affinity 
-acrix 

Most -Favored a-ino acid 

Lwsc-F.v.'crci asir.o acid 

Abundance of A molecules 
encoding a.-iir.o acid x 

Outdr .-onbrar.e protein 

nuc lect idc 

A b into 1 ecu I at dissoc iac ion 
constant, ;U - { A H 2 ]/ : A: 3 ■ 
Signal-sequer.ce Peptidase I 



Yield of ssCNA up to Q bases 
long 

Max i nun length of :*.sCMA that 
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can be synthesized in 
acceptable yield 

Yield of plassid ON' A per 
volur.e of culture 

d::a ligation efficiency 

Kaxinum nunber of 

transf omants produced fron 

Yqioo dna °' lnsert 

Efficiency o; enruna tographio 
enrichnent, enric^cnt per 
pass 

Sensitivity cf chroma tcqrnph ic 
separation, can :md L '.n S, 

^axir.un nurxpr o: enrichment 
cycles per variegation cycle 

Error level in synthes i z ir.cj 
v=d::a 



Sec. O.T' g.fcand»rd s^fTuenc ir.q method: 

The present invention is not limited to a single 
method of deternininq the sequence of nucleotides (nts) 
in DMA subsequences. In the preferred er.bod irr-ent , 
plasnids are isolated and denatured in the presence of 
a sequencing prir.er, about 20 nts long, that anneals to 
a region adjacent, on the 5' side, to the region of 
interest. This plaumid is then used as the tenpi.ite in 
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the four sequencing reactions with one dideoxy 
substrate in each. Sequencing reactions, agarose gel 
electrophoresis, and poLyacry lamide gel electrophoresis 
(PAGE) are performed by standard procedures ( AUSUS7 J . 

The present invention is not limited to a single 
method of determining protein sequences, and reference 
in the appended claims to determining the amino acid 
sequence of a domain is intended to incLude any 
practical method or combination of methods, whether 
direct or indirect. The preferred method, in most 
cases, is to determine the sequence of the ON A that 
encodes the protein and then to infer the amino acid 
sequence. In sone cases, standard methods of protein- 
sequence determination may be needed to detect pest- 
translational processing. 



2Q The. major steps in the process of making and 

isolating a novel binding protein with affinity :or a 
chosen target material are illustrated in Figure 2. 



Sec. \: 



Specification of Conor, ic P^cji^c_* njj^? .t n * tor 



Displaying 
Surface: 



a Heterol o gous P in cline Dor^.in On res 0;it ?_r 



S ec. 1.0: General Ro an i ror.rnc s for Genet i_c_JPa_c fcaaos 

It is emphasized that the CP on which selection- 
through-binding will be practiced -use be capable, 
after the selection, either of growth in some suitable 
environment or of jn ■/ ■ r m amplification and recovery 
of the encapsulated genetic message. During at least 
part of the growth, the increase in number must bo 
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approximately exponential witn respect to time. The 
component of a population that exhibits the desired 
binding properties nay be quite small, for example, one 
in 10 6 or less. Once this component of the. popu I a t ion 
is separated from the ncn-bindinq components, it nust 
be possible to amplify it. Culturing viable colls is 
the most powerful amplification of genetic material 
known and is preferred. Cenetic messages can also to 
amplified in vitro , but this is not preferred. 

A CP nay typically be a vegetative bacterial cell, 
a bacterial spore or a bacterial ONA virus. A strain 
of any living cell or virus is potentially usctul if 
the strain can be: 

1) maintained in culture, 

2) affinity separated and retain its- viability. 

3) genetically altered with reasonable facility, 
and 

4) manipulated to display the potential binding 
protein domain where it can interact with the 
target material during affinity separation. 

We believe that it is possible to cause a genetic 
package to display the IPSO or P3D on its outer surface 
without adversely affecting the viability of the CP or 
the binding characteristics of the IP3D or PBD. 

It is generally believed that the part of the 
polypeptide chain composing one domain folds almost 
independently of the parts composing other domains. 
There are natural proteins composed of two or more 
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donains for which ' there is strong evidence that 
essentially the sane domain occurs nore than once; for 
example ovomucoids and ovo inn ib i tors (SCOT37) and 
Jtallikrein (CHUN36) . Furthermore, essentially the sane 
donain can occur in several different proteins (SUDH35, 
CIL385, and SCOT87) . 

Rossman (ROSS31) and ethers have pointed cut that 
the 3D structure of individual domains can be preserved 
during protein evolution, even after the anino acid 
sequences have diverged so much that no significant 
hcr.ology can be detected. Hollecker and Creighton 
(KOLLS3) studied the folding pathways of tvo blacJc 
naciba venom proteins ( cal led I and K ) that arc 
homologous to B?TI. Although the sequences of I and K 
are clearly related to BPTI by tne identity of 19 and 
23 residues respectively, including all six cysteine 
residues, there are 33 and 34 differences. t-'ot only 
are the 3D structures of the proteins very sir.ilar, but 
the pathway cf folding has also been conserved. 

When gene fragments coding tor two ccr.ains from 
different proteins have been joined by genetic 
engineering and expressed, the dcrr.ains from the 
original proteins sonet imes fold independently while 
tethered to each other (TOTH36, SMI735. MA.'.'036j . If 
the insertion is the gene for the entire protein, that 
protein may be converted into a dona in of the larger 
protein. Fusions of genes that determine the domains, 
however, must be done at or near domain junctions, or 
domain function nay be impaired (C3A'<,-$7, TOTH36) . In 
some cases, the inserted donain will fold, but the 
recipient protein will not; Beckwit.Vs fusions of rra 1 F 
and pho A genes (BECKS 3 , ttAt:036) gave rise to functional 
PhoA domains attached to a fragment cf MalF that 
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anchored the chimeric protein in the lipid bilayer. 
The Hal F protein was incomplete and could not function. 

There are tuo basic methods of arranging that the 
iEbd gene is expressed in such a manner that the IPBO 
is displayed on the outer surface of the CP. 

First. C::A encoding the IPBO sequence nay be 
oper.ibly linked to CtlA encoding all or part of an outer 
surface protein (OSP) native to the CP. If one or more 
fusions of fragments of x genes to fragments of a 
natural osfi gene are known to cause X protein domains 
to appear on the CP surface, then we pick the DMA 
sequence in which an i 0 bd gene fragment replaces the x 
gene fragment in one of the successful o^ fusions as 
a preferred gene to oe tested for the display-of-IPBO 
phonocvpi. (The gene r.ay be constructed in any 
r.anner.) If no fusion data are available, then we fuse 
an iabd. fragment to various fragments. such as 
fre?r.cnr.s that end at known or predicted domain 
boundaries, of the osfi 0 ene and obtain CPs that display 
the or-P-i nhd fusion on the CP cuter surface by 
screening or selection for the display -of-I PUO 
phenotype. The fusion of i2bd and £2S fragments may 
also include fragments of random or pseudorandom DNA to 
produce a population, =e=bcrs of which may display IPED 
on the CP surface. The combers displaying IPBD are 
isolated by screening or selection for the dis P :..y-of- 
bir.ding phenotype. 

While most bacterial proteins remain in the 
cytoplasm, others are transported to the periplastic 
space (which lies between the plas=a membrane and the 
cell w.ill of gr.i--nogativc bacteria), or are conveyed 
and anchored to the outer surface of the cell. Still 
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others arc exported (secreted) into the medium 
surrounding the cell. These characteristics of a 
protein that are recognized by a cell and that cause it 
to be transported out of the cytoplasm and displayed on 
the cell surface will be termed "yucer-surface 
transport .signals" . 

It is believed that the co'nditicr.s for an outer 
surface transport signal .arc r.oc particularly 
stringent, i.e., a random polypeptide of appropriate 
length (preferably 30-100 amino acids) has a reasonable 
chance of providing such a signal. Thus, by 
constructing a chimeric gene comprising a segment 
encoding the IFBD linked to a segment of random or 
pseudorandom DMA (the potential 05T5) . jnd placing this 
gene under control of a suitable promoter, there is a 
possibility that the chimeric protein r.o encoded will 
function as an OSP-I?eo. 

This possibility is greatly enhanced by 
constructing numerous such genes, c.ich having a 
different potential OSTS . cloning t!:em into a suitable 
host, and selecting tor transformers:; bearing the LPFiD 
(or other marher) cn their outer surface. 

The repiicablc genetic entity (phage or plasmid) 



that carries the or.n-ohd genes (derivec 



ron the 



ip_bd gene) through the sc 1 action -through-bir.d ing 
process, see Sec. 14, is referred to hereinafter as the 
operative cloning veccor (0C7) . When the CCV is a 
phage, it may also serve as the genetic package. The 
choice of a CP is dependent in part on the availability 
of a suitable OCV and suitable CSP. 



-r ' 




v. 




II 

cm 



47 

Preferably, the CP is readily scored, for example, 
by freezing. If the CP is a cell, it should have a 
short doubling time, such as 20-40 minutes. If the C? 
is a virus, it should be prolific, e.g., a burst sUc 
5 of at least 100/infectcd cell. CPs which are finicky 
or expensive to culture are disfavored. The CP should 
be easy to harvest, preferably by ccntri fixation. The 
CP is preferably stable for a tcr.pcrature range of -TO 
co <2°C (stable at 4°C for several days or weeks); 

10 resiscant to shear forces found in HPLC; ir.sensitive co 
UV; tolerant of desiccation; and resistant to a pH of 
2.0 to 10.0, surface accive agents such as SDS or 
Triton, chaotropes such as Mi urea or 2M guanidiniun 
HC1, common ions such as , Na\ and 50 4 ". common 

15 organic solvents such as ether and acetone, and 
degradativc enzymes. Finally, there must be a suitable 
OCV (see Sec. 3) . 

Although knowledge of specific OSPs may not be 
20 required for vegetative bacterial cells and ' endospore-J . 
the user of the present invention, preferably, will 
kne*--: Is the sequence of any a r iS known? (preferably 
yes, at least one required for phage). -Hp* does ::he 
OSP arrive at the surface of CP? (knowledge of route 
25 necessary, different rouces have different uses, no 
route preferred per SJLJ- ls tne 0S? 

post-translacionally processed? (no processing most 
preferred, predictable processing preferred over 
unpredictable processing). What rules are known 
.1C governing this processing, if there is any processing? 
(no processing most preferred, predictable processing 
acceptable). What function docs Che OSP serve in the 
outer surface? (preferably not essential). Is the ZD 
structure of an OSP known? (highly preferred). Are 
fusions between fragments of oj_:r> and a fragment of x 
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Known? Does expression ot thcuo fusions load to X 
appearing on Che surface of the CP? {fusion data is a:: 
preferred as knowledge of a 3D structure) . Is a "2D'* 
structure of an OSP available? (in this context, a "20" 
5 structure indicates which residues are exposed on the 
cell surface). (2D structure less preferred than 3D 
structure)*." Whcra are the domain boundaries in the 
05?? (not as preferred as a 2D structure, but 
acceptable) . Could IPBD gc througn the cane process as 

10 OSP and fold correctly? ( I FDD night need prosthetic 
groups) (preferably IPDC will fold after same 

process) . Is the sequence of an oso promoter known? 
(preferably yes). Is or-n gene controlled by 

requlatable promoter available? ^'preferably yes) . What 

15 activates this promoter? (preferably a diffusible 
chenical, such as IPTG). How many different OSFs do we 
know? (the more the better) . How ^any copies of each 
OSP are present on each package? (.-.ore is better) . 

2 0 The user will want knowledge of the physical 

attributes of the CP: How larqe is the CP? (knowledge 
useful in deciding how to isolate CPs) (preferaDly easy 
to separate fron soluble protein? such as IgGs) . Wh^t 
is the charge on the CP? (neutral preferred). What is 

25 the sedincntat ion rate of the CP? (knowledge preferred, 
no particular value preferred) . 

The preferred CP, OCV and CJ? a:c those for which 
the fewest serious obstacles can bo seen, rather than 
30 the one that scores highest on any one criterion. 

Kext, we consider general answers to the questions 
posed in this step for the cases of: a) vegetatively 
growing bacterial ceils (Sec. 1.1), b) bacterial spores 
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(Sec. 1.2)'. .and c) (Sec. 1.3). Preferred OSPs for 
several CPs arc given in Table 2. 

ii- nacte -i , ^ cc>\\s a* Genetic packages: 

One may choose any well-characterized bacterial 
strain which may be grown in culture. The important 
questions in this case are: a) do we knew enough about 
mechanisms that localize proteins on the outside of the 

10 cell, b) will the IPED Cold in the environment of the 
outer membrane, and c) will cells change expression of 
oso-obd , derived from osp-ipbd, during affinity 
separation? Some IPBDs may need large or insoluble 
prosthetic groups, such as hacm or an Fe,S 4 cluster, 

15 that are available within the cell, but not in the 
medium. The formation of Fe 4 S 4 clusters found in some 
ferrcdoxins is catalyzed by enzymes found in the cell 
(BONC35). IPBC3 that require such prosthetic groups 
nay foil to told or function if display^ on bacterial 

20 cells. 

i.i.'i: ?ref orrfid-tV-intar V^ Colls ns G-> : 

The species chosen should have a vcii- 
25 characterized genetic system and strains defective in 
genetic recombination should be available. The chosen 
strain nay need to '~e manipulated to prevent changes of 
its physioloqicrl state that would alter the number or 
type of proteins or other molecules on the cell surre.ee 
:o during the affinity separation procedure. In view of 
the extensive knowledge of coll, * strain oJ E_, 

coU, defective in recombination, is the strongest 
candidate as a bacterial CP. Other preferred 
candidates a rc S^L™orJJii ZX2h^u±±llZ . ftefiUIiiS 
3 5 su btil is . and mCK^Sir^Qil.*: aeruginosa. 
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Induction of synthesis of engineered genes in 
vegetative bacterial cells has been exercised through 
the use of regulated promoters such as l acL r V5 , trnp , cr 
tac (MANI32). The factors that regulate the quantity 
of protein synthesized include: a) promoter strength 
f cf . HOOP87), b) rate of initiation of translation (cfj. 
GOLD87), c) codon usage, d) secondary structure of 
nRHA, including attenuators f cf . LAKD87) and 
terminators f cf . YACH:37) , e) interaction of proteins 
with icfCiA (cL MCPK3S, MILI.STb, Wi:iT87) t f) degradation 
rates of mRNA f cf . SUBB83), g) proteolysis fcf. 
C0TTS7J. These factors are sufficiently veil 
understood that a vide variety of heterologous proteins 
can now be produced in col i cr EU s ubtil is in at 
least moderate quantities (SKER38, BETTOSJ. 

Zee . 1.1.2: Prcferrrvi Q -. iter Surface Frct ci r.s t-2 v 
Displaying IP90s on P.tcrerial Cells: 

Grain-negative bacteria have outer-moror :ne 
proteins (OKP) , thot form a subset of QS?s. Many 0M?s 
span the membrane one cr more times. The signals chat 
cause OMPs to localise in the outer membrane are 
encoced in the omino acid sequence of the nature 
protein. Fusions of fragments of oro genes with 
fragments of an x gene have led- to X appearing on the 
outer membrane (nE:.*S34, CLEMSl). The rules that govern 
the localization of OMP-X fusion proteins are not yet 
fully elucidated. :tany C. M .Fs are polymeric and non- 
essential; a non-essential CMP is preferred. A non- 
essential OMP for which there is knowledge ot which 
residues are on the coll surface is more preferred. A 
non-ossont ial OMP for which there is data showing that 
X is displayed as part of an OMP-X fusion is most 



preferred. If no fusion data are available, then we 
fuse an jpbd fragment to various fragments of the cso 
qene and obtain CPs that display the osp-i.pbd fusion on 
the cell outer surface by screening or selection for 
the dispiay-of-IPBD phenotype. 

Oliver has reviewed cechanis.?.: of protein 
secretion in bacteria (OLIV35 and OLIV37). Nikaido and 
Vaara (NIKAS7) have reviewed mechanises by which 
proteins become localized to the outer membrane of 
Cran-negative bacteria. For example, the L*mS protein 
of coli is synthesized with a typical signal- 

sequence which is subsequently removed. Benson et ^ 
(3EN.S34) showed that LamB-LacZ fusion prcieins would be 
deposited in the outer membrane of ^ .coll u ^ ep - 
residues 1-49 of the mature LamB protein ar- included 
in the fusion, but that r-sidues l-O are insufficient. 
The rules that govern localization of proteins in the 
outer membrane cf Cran-negative bacteria r*r.ain vague. 
Kiiiser et al^- (KAIS37) showed that the export iiwr.ai in 
q^charorvecs cerevisiae is very broad, fceciu.se when 
they fused random human DNA sequences to C::a coding for 
mature invertace. about one fifth cf the sequences 
resulted in the appearance of invertase free in the 
medium . 

The outer membrane protein LamB of coli is a 
porin for maltose and ma 1 todext r i n transport, and 
serves as the receptor for adsorption cf bacteriophages 
lambda and K10. This protein har, teen ourifiod zo 
homogeneity (ENDE78J and shown to function as a trimer 
( PA LV 70). Mutations to phage resistance have been used 
to define the pans of the LamB protf.in ^hat adsorb 
each phage (RCAKSO. CLEK31, CLEM8 3 , CEHKS7 J . fhage- 
resistance mutations are dominant (HARCS2). suggesting 



that there is no 
mutant subunits. 

In lamB * cells, addition of naltose or 
5 maltodcxtrin inhibits a t*orn of motility called cell 
swarming, and lamB mutants defective in this process 
have been characterized (IIE.N38) . These nutations have 
been sequenced and compared to the wild-type sequence 
{CLEM8 1 ) and the concomitant protein domains have been 
10 analyzed (CLEM83). Topological models have been 
developed that describe the function of phage receptor 
and maltodextrin transport. The models describe these 
domains and their locations with respect to the 
surfaces of the outer membrane (CHAR3 4 , HEIN'33). 

15 

LamB is transported to the outer merit-rune if a 
functional N-termina 1 ■ sequence is present; further, the 
firr.t 4 9 amino acids of tne mature sequence are 
reauired for successful transport (BENS34). Homology 

20 between parts of UmB protein and other cuter mer.brane 
proteins OmpC, OmpK and FhoE has been detected 
(tH.j\84); including homology between LanB amino acids 
39-49 and sequences of the ether proceins. These 
subsequences nay label the proceins for transport to 

25 the outer newbrar.e. Further, monoclonal antibodies 
derived from mice immunized with purified LamU, have 
been used to characterize four distinct topological and 
functional regions, two of which are concerned with 
maltose transport (CABAU2 ) . 

?0 

Genoral Knowledge on processing of signal 
sequences in JL. col i is relevant to the present 
invention both for use of col i per so and for use in 
conjunction with filamentous phage (vide infra) . 
35 Genetic experiments on processing of signal scq'-inces 
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preferential assembly cf wild-type or 
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Beckwith <BiXK33 and KAt:086) has shown that when 

the ohoA <J°"° i= i«= crtcd in frane inC ° ^ C ° di " 9 
sequence for an integral membrane protein, for example 
Half, that the PhoA domain is local, i.-.ed according to 
whe-e in the integral membrane protein the phoA gene 
wis inserted. That is. if oho* is inserted after an 
anino acid wnich normally is found in th* cysopl.ua. 
20 then PhoA appears in the cytoplasm. If BbsA i» 
inserted after an aair.o acid norma ll-/ found in the 
periplasm.' however, then the PhoA domain is localized 
pn the periplasmic'side of the membrane, and anchored 
in it. 
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Bcckvith and colleagues <USCKB3> have extended 
these observations to the IfiSl gene that can to 
inserted into oen-s for integral membrane proteins such 
that the LacZ domain appears in either the cytoplasm or 
the periplasm according to where the L*c= gene -as 
inserted. 

.nr. 1.1.3 "1 "-° " f Tn^nrtiun s i r> for IPDD in 
n J ^ r o T ; a l mil OSP: 



3f. 



indicate that if the S21-F22-A23 sequence is preserved. | 
signal peptidase (SP-I) will cleave after A23 (OLIV87,. 
^ Han y examples have been cited in which thcOHA coding . 

^£ for the leader or signal sequence from one protein has . . 

be»n attached to the DMA sequence coding for another 
protein, protein X ( DECKS 3 , IMOU86 Ch'10. LEEC8 6, 
NARKB6. and Expression of such a chimeric 

gene often causes protein X to appear tree tn the 
periplasm. That is, the leader causes the new protem 
to be secreted through the lipid bilayer: once in the 
periplasm, it is cleaved off by SP-I. 
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OSP-irBD fusion proteins need not fill a 
structural role in the outer r.enbranes of Cram-negative 
bacteria because parts of the outer -cnbranes are not 
highly ordered. for large OSPs there is likely to be 
one or nore sites at which qsp can be truncated and 
fused to ioh d such that cells expressing the fusion 
will display XPSDs on the celL surface. If fusions 
between fragments ot qtq and x have jeen shown to 
display X on the cell surface, we can design an osp- 
jpbd gene by substituting inbd for x in the DMA 
sequence. Otherwise, successful 0MP-IP3D fusion is 
preferably sought by fusing fragments o: the best crnp. 
to an i nhd . expressing the fused gsr.e, and testing the 
resultant CPs for disolay-of-IPBD ph-sr.otype. Ks use 
the available data abcut OMP to pick the point or 
points of fusion between o£o and irr.d to aaxinize the 
likelihood that IP3D will be displayed. Alternatively, 
we truncate c-zo at several sites c: in a r.anner that 
produces cso c'rarpr.ants of variable Lcr.gt.n and ase the 
cno fragments to icbd ; cells expressing the fusion are 
screened or selected which display I?20s on the cell 
surface.. An additional alternative is to i.ncluue 3hort 
segnents of randon D;.*A in the fusion of ono f racrr.ents 
to iobd and then screen or select the resulting 
variegated population fcr -embers exhibiting the 
d ispiay-o f- IPBD pher.otypo . 

The promoter for the cso-infrd gene, preferably, is 
subject to regulation by a sr.all chemical inducer, such 
as iscpropyl th icga lactos ide (IPTC) (lac pronoter). ft 
need net come fcon a natural oso gene; any regu la table 
bacterial pronoter can be used. 



Once a genetic packaging systen employing 



r 



r 




caUs has been d«iqncd. 

time « choose a" ^ 
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specifically sticky so that CPs displaying incomplete 
PDDs are easily removed from the population. 

The random ONA can be generated from any ON A 
having high sequence diversity by partially digesting 
with an enzyme that cuts very oftcr.. S a u : A I, for 
example, generates cohes ivc-ended DMA that can be 
cloned into a S^rlt I or n~\ II site. Alternatively, 
one could shear D.'iA having hign sequence diversity, 
blunt the sheared DNA with the large fragment of 
col i OSA polymerase I (hereinafter referred to as 
Klenov fragment), and clone the sheared and blunted CNA 
into blunt sites of the vector (V.ANI82. p295, AUSL'37: 
5.1.1) . 

Sec. 1 . T. : Displayi ng TPPO on b..-_T -? r L-i 1 sjorcs: 

Bacterial spores have desirable properties as CP 
candidates. Hac • 1 1 us spores neither actively 

metabolize nor alter the proteins on their surface. 
However, spores are much more resistant than vegetative 
bacterial cells or phage to chemical and physical 
agents. Spores have the disadvantage that the 
molecular mechanisms tnat trigger sporulation are less 
.well worked out than is the formation of Hi: or the 
export of protein to the outer membrane of 5\ o \ i . 

Sec. 1.2.1. : Preferred Pictorial Spore s for Use a ? _C?j_L 

Bacteria of the genus D.- k: i. 1 1 us form endospores 
that are extremely resistant to damage by heat, 
radiation, desiccation, and toxic chemicals (reviewed 
by Lcsick et r.\ . (LCSIG6)). These spores have cor.plex 
structure and morphogenesis that is species-specific 
and only partially elucidated. The following 
observations are relevant to the use of 3aci 1 lus spores 
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as genet \c packages for the purposes of the present 
invention - 

Plasmid DN'A is commonly included in spores. 
5 Plasmid encoded proteins have been observed on the 
surface of O^ ci 1 lus spores ( DEDRG 0 ) . Sporulation 
involves complex temporal regulation that is now 
moderately well understood (LOSI36). Special sigma 
factors, such as sigma E , are produced during 

10 sporulation. RI.'A polymerase bound to a sporulation 
sigma factor recognizes promoters that are not 
recognized by RIIA polymerase bound to a vegetative 
sigma factor. The sequences of several sporulation 
promoters ore known; coding sequences operat i ve.ly 

15 linked to such promoters are expressed only during 
sporulation. Ray et a i . (RAYC87) have shown that the 
C4 promoter of B^. subt ills is directly controlled by 
RJ/A polymerase bound to s igma^ . 

20 Donovan et a 1 . have identified several polypeptide 

components of £L_ subt i 1 is spore coat (DON087) ; the 
sequences of r.vo complete coat proteins and ami.nc- 
teminal fragments of two others have been determined. 
Some components of the spore are synthesized in the 

25 " forespore, e.g. small acid-soluble spore proteins 
(ERRI38) , while other components are synthesized in the 
mother cell and appear in the spore ( e . g . the coat 
proteins). This spatial organization of synthesis is 
controlled at the transcriptional level. 

20 

Spores self-assemble, but the signals that cause 
various proteins to localize in different parts of the 
spore are not veil understood; presumably, tho signals 
controlling deposition of the coat proteins Crcn the 
35 cytopLasm of the mother cell onto the spore coat are 
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embedded in the pol/pop: i -f.n sequence. Some, but not 
all, of che coat proteins are synches Lzed as precursors 
and are then processed oy specific proteases before 
deposition in the spore coat (CONOa7). Viable spores 
that differ only sligr.tly from wild- type arc produced 
in subt ill s even if any one of four coat, proteins is 
r.issir.q (OCN037). Disuirice bonds- torn vichin the 
s::ore f thiol rouucir.o a*;cnts are .-.ceded to r>oLub:lize 
several of the proteins of the codt) . The L2kd ccat 
protein, CotO, contains 5 cysteines. CctO also 
contains an unusually hiqh nu-.ber of histidines (16) 
and prolines (7). The ll>;d coat protein, CotC, 
contains only one cysteine and one methionine. CotC 
has a very unusual anino-nc id sequence with 19 lysines 
(K) appearing as 9 >'-:< dis^ptides and one isolated K. 
Th*-re are also 20 tyrosines (?) of which 10 appear as b 
V-V dipepcides. Peptides rich in I ar^ K aro Kr own to 
bccr.ee cross! in/:ed in oxic : <: inc. environments (0EVO73, 
WA1T3:. *..*ArT35, WATTS £) . »: = tC cor.:: ir:s 16 0 and E 
amino acids that nearly ccuals the l*j There jr.; no 

A, r", P. I, L, H, P, 0, S , or W amino at* ids in CotC. 
Neither CotC. nor CotD is p - s t- trans 1 a t i ona 1 ly cleaved. 
The proteins CotA and Cot3 are pes t-tran ^ 1 a t iona 1 ly 
cleaved. 

■ Endospores from tho cc-nus 3ac i 1 1 u s are rr.ore stable 
than are axo spores : rcr. f r.reptonvces . J\a ci II us 
sr:h t i 1 1 s forms spores in <*. to 6 hours, but 1»* t ro ptc r.vces 
species nay require d^ys or weeks to sporulate. In 
addition, genetic >;nowied':e and manipulation is nuch 
ncrc developed for R ._ ".s r.-. i 1 i s than for cthrr spore- 
torrr.ing bacteria. Thus P .-.c \ 1 lus spores are preferred 
over S*".ronr.or.vc^s spores. Bacteria of the qenus 
CI .vs t r id iun also for:?, very durable endosperms, but 
clcscridia. being strict ani?rcbes, arc noc convenient 
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to culture. The choice of a species of Bac i I tus is 
governed by knowledge and availability of cloning 
systems and by how easily sporulation can be 
controlled. A particular strain is chosen by the 
criteria listed in Sec. 1.0. Spores are exposed to an 
oxidative env i ronmcnt after release f rora the mother 
cell, so that C i.iu 1 f ides , if any, within* the 11'LiD night 
form. Kany vr/;ctutive biochemical pathways are shut 
down when sporulation begins so that prosthetic groups 
might not be available. 

Sec . 1.2.2 P referred outer-surface proteins for 

Displaying TP 30 on Bacterial Soores: 

If a spore is chosen as CP , the promoter is the 
most . important part of the org gene, because the 
promoter of a spore coac protein is most active: a) 
when spore coat protein is being synthesized and 
deposited onto' the spore ond b) in the specific place 
that spore coat proteins ar? being made. In 3_._ 
s ubt ills , libr.e of the spore coac proteins are post- 
translat ional ly processed by specific proteases. It is 
valuable to know che sequences of precursors and mature 
coat proteins so that we can avoid incorporating the 
'recognition sequence cf the specific protease into our 
construction of an OSP-ITiU fusion. The sequence of a 
mature spore cent protein contains information that 
causes the protein to be deposited in the spore coac; 
thus gene fusions that include some or all of a r.ature 
coat protein sequence arc preferred for screening or 
selection for the d isplay-cf-IPBD phenotype. 

Fusions of \ nbi fragments to cote or cotD 
fragments are likesly- to cause IPDD to appear on the 
spore surface. The genes cctC and cotO arc preferred 
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osp genes because CotC and CotD ore not post- 

trans lational ly cleaved. Subsequences from cor. A or J_ 

cotB could also be used to cause an IPSO to appear on \ 

the surface of L subt i t is spores, but -e must take the 

post-translaticnal cleavage of these proteins into t\ 

account. DJ/A encoding IPiiO ecu Id oe fused to a 

fragment of co: A or coin at cipher end of thG ccaing £ 

region or at sices interior to tr.e ceding region. *' 

Spores could then be screened cr selected for the |»"; 

display-of-I PRO phenotype. 

To "date, no lac i 1 1 us sporuljtion promoter has been 
shown" to be inducible by an exogenous chemical inducer lj 



as the lac promoter of c c_LL • Nevertheless, the 

quantity of protein produced from a sporulaticn 
promoter can be controlled by other factors, such as 
the Di'JA sequence around the Sh i ne- Da Iga rno sequence cr 
codon usage. Chemically inducible sporulaticn 
promoters can be developed if r.ece-sa ry . 

Soc. 1.2.3: Choice of !r,:qr: ion si to for I?30 ir. OS? 
of Oactorial Sncre: 

The considerations governing insertion site in tne 
spore OSP are the sa.T.e as these given in Section 1.1.3. 

Sec . 1.2.4: ; n V ivo ect i r .n f or ? s oudo-o so r -v.".qs 

From Random £NA rnr.erts in Ha c tor i -h 1 Spores: 
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Although the considerations for r»pores are n*.*c*rly f 

identical to the considerations for vegetative »- 

bacterial cells (Sec. 1.1), the available information if 

on the mechanisms that wise proteins to appear cn t" 

r 

spores is reader so that use of the random-DNA approach L 

becomes a more attractive, option. i 

t. 
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We can use the approach described .*ibove at 1.1.4 
for attaching an IPDD to an ^ cp_U cell, except that: 
a) a sporulatior promoter is used, and ,b) no 
5 periplastic signal sequence should be present. 

r 

^. nisnla vi nn TT'L-r? en Outer Surface ot Phac csj. 

spr. 1-3.1= P^fprrod r>h.-.cns for Use CFr.: 

10 

Unlike bacterial cells and spores, c'noice of a 
phage depends strongly on knew ledge of the 30 structure 
of ar. CSH and hew it ir.taracts with other proteins in 
che caps id. The sue of the phage genome and the 
15 packaging mechanism are also important because the 
phage genome itself is the cloning vector. The oso^ 
i or.-i gene must be inserted into the phage genome; 
tr.eref ore : 

20 1) the virion must bo cap.ibie of accepting the 

insertion or substitution ot genetic material, and 

2) the genome of the phage must be small enough to 
allow convenient nan i pu la t i*:n . 
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Additional considerations in choosing phage are: 

1) the morphogenetic pathway of the phage 
determines the envi rr.r.r.ent in which the 11-30 will 
have opportunity to fold, 

2) I PUDs containing essential disulfides may nat 
fold within a coll, 
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3) IPDDs needing Large or insoluble prosthetic 
groups may not fold if secreted because the 
prosthetic group is lacking, and 

5 A) when variegation is introduced in Part III. 

multiple infections ecu Id generate hybrid CPs that 
c . r r y the cer.e Ccr :r.o P;SD but have dt least jorr.c 
copies of a different PGO on their surfaces; it is 
preferable to minimise this possibility. 

10 

Bacteriophages are excellent candidates for CPs 
because there is little or no enzymatic activity 
associated with intact -.it-are phage, and because the 
genes are inactive o-jtsice a bacterial host, rendering 
15 the nature phage particles r.e tabel ica 1 ly inert. The 
fila-entous phc.ee >:] 3 ar.d bacteriophage HhiXl'M are of 
particular interest. 

Fila.-gr.tc-us 

20 

The entire life cycle cf the filamentous phace 
M13, a conr.on clcr.ing an: sequencing vector, is well 
understood. ;*12 ann t'l ore so closely related that we 
consider the prcpertios of each relevant to both 

2? ( RASC3 6 ) ; any differentiation is fcr historical 

accuracy. The genetic structure (the complete sequence 
(SCHA75) , the identity ani function of the ten genes, 
and tr.e order of transcription and location of the 
prcr.ctorf.) cf .".13 is wo 11 known as is the pays i oil 

30 structure of the virion (Il.V.'N'Sl, DOIIKoO, CHA::7Q. 
ITCH79, KAPL7S. KL*K::3 5fc, KUm.'8 7, MAXOS0, ^\RV7S. 
I'XSSIZ. 0NKA31. r-\SCES. ?.wS."ol. SCIIA78, SMIT35. WE3573, 
and 7.Z>VA3Z) ; see RASC3 6 tor a recent review cf the 
structure and function of the coat proteins. 
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Filamentous, chace enta: EL. co I i through the sex 
pilus cells bearing the F- factor . Achtman et al . 
(ACHT7S ) observed that the pilus is extraordinarily 
sensitive to SOS; 0.03% SDS inhibits binding of MS2 to 
pilin in v i tro . Infection nay therefore be inhibited 
by SCS. 

The 50 anino ocid nature coat protein is 
synthesized as a 73 amino acid precoat (ITOK79). The 
first 23 amino acids constitute a typical signal- 
sequence which causes the nascent polypeptide to fce 
inserted into the inner cell membrane. 

An col i signal peptidase (SP-I) recognizes 

amino acids 16, 21, and 23, and, to a lesser extent, 
residue 22, And cuts between residues 2 3 and 2 4 of the 
precoat (KUHN'3Sa, KUHNSSb, OLIVS7). . {See also sec. 
1.1.2 for general knowledge on secretion in "£_*. col i . ) 
After removal c f the signal sequence, the iir.ino 
terninui of' the mature coat is located on the 
periplastic side cf the inner membrane; the carboxy 
terminus is on the cytoplasmic side. About 3000 copies 
of the mature 5.0 ar.ino acid coat protein associate 
s-ide-by-s ide in the inner r.e-brane . 

The gene VI, VII, and I'A proteins are also present 
at the ends of the virion in about five copies each. 
The sir.gle-stran.lod circular phage CN*A associates with 
about five copies of the gene Til protein and is then 
extruded through the catch of mexb rar.e-a ssoc ir. ted ccat 
protein in such a way th.^t the DNA is encased in a 
helical sheath of protein -WE3S7S) . Th*> DUA dees not 
base pair (that would impose severe restrictions on the 
virus gonone) ; rather the bases intercalate with each 
other independent of sequence. Decause the Ml 3 geno;r.c 
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is extruded through r>.e rze^brane and coated by a large 
number of identic.il protein molecules, it can be used 
as a cloning vector (WATS37 p273, and MESS77) . Thus we 
can insert extra genes into H13 and they will be 
carried along in a stable manner. 

V. 

Mar\in and collaborators (MARV7 3, KAK030. BANN91) 
have determined an soproxinate 3D virion structure of 
fl by a combination of genetics, biochemistry, and X- 
ray diffraction from fibers of the virus. Figure 3 is 
drawn alter the -;odei of Banner et al . ( BANN3 1 ) and 
shows only the C a lpha s of the protein. The apparent 
holes in the cylindrical sheath are actually filled by 
protsin side groups so that the ONA within is 
protected. The amino terminus of each protein monomer 
is to the outside cf the cylinder, while the carboxy 
terminus is at smaller radius, near the D:?A. Although 
othor filamentous phages ( e.g. PCI or Ikt-'. have 
different- helici 1 symmetry, all have coats composed o: 
many short • a Ipha -he I ica 1 monomers with the ar.ino 
terminus oZ each nor.or.er on the virion surfece. 

Doctor ioohaoe Phl>:i7.: : 

The bacteriophage PMX17 4 is a very small 
icosahedral virus which has been thoroughly studied by 
genetics, biochemistry, and electron microscopy {See 
The S ingle-Strand fid D.'.'A Phases ( CENH7S ) ) . To date, no 
proteins from PhiX174 have been studied by * X-ray 
diffraction.. PniX174 is not used as a cloning vector 
because PhiX174 can accept almost no additional DS'A; 
the virus is so tightly constrained that several of its 
genes overlap. Chambers et al ■ ( Ch"A. M .5 2 ) sho-cd that 
mutants in gene C are rescued by the wild-type G gene 
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F. 

carried on a plasmid so that the host supplies this j"- 
protein. 

Three gene products of PniX174 are present on the \, 

c ' 

outside of the nature virion: F (capsid) , C (major 
spike protein, 60 copies per virion) , and H '(minor 



spike protein, 12 copies per virion). The C protein 
comprises 175 amino acids, vhile H comprises 328 amino 
acids. The F protein interacts uith the sinqle- 
10 stranded DNA of the virus. The proteins F, G, and H 
are translated frotr. a single mRN'A in the viral infected 
cells. 



RNA Ph ones 



altered RNA-containing particles could be derived from 
35 RJiA phage, such as M52 . 
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La rqo DNA Phages 

Phage such as lambda or T4 have much larger jf 
gencnes than do M13 or PhiX174. Large qenones are less 
conveniently manipulated than snail genomes. A phage 
witl. a large genome, however, could be used if genetic 
20 manipulation is sufficiently convenient. Phage such as 

lambda and T4 have more complicated 3D capsid !'V 
structures than M13 or PhiX174, with more OSPs to ^. 
choose from. Phage lar.bda virions and phage T4 virions }. 
* forra intracel lularly , so that IPODs requiring large cr )y. 
25 insoluble prosthetic groups night fold on the surfaces [ 
of these phage. f . 



30 RNA phage, such as C'oeta, are not preferred |" 

r- 

because manipulation of R.TA is much less convenient 

than is the manipulation of UNA. Although competent . [ 

PJ/A bacteriophage are not preferred, useful genetically \\ 
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MS2 is a typical srr.all RJIA phage chat carries only 
three genes that are tightly regulated through RNA 
structure and protein-R:JA interactions. The RJ'A fills 
the protein capsid so that no additional genes can be 
acccanorioted. To use MS2 as a CP. ve would need to 
elininate most of the natural viral genome so' that an 
osp-inbd gene could fit into enc protein capsid. It is 
known that the A protein binds sequence-spec i f ical ly to 
a site at the 5' end of the + RNA strand triggering 
formation of R:iA-containing- particles if coat protein 
is present. If a message containing the A protein 
binding site and the gene for a chir.era .of coat protein 
and a PSD were produced in a cell that also contained A 
protein and wild-typs coat protein (both produced from 
regulated genes on a plaszid) , then* the RNA ceding for 
the ch.iz-.eric protein would get packaged. The viral RNA 
replicase gene is not needed because all .components 
needed for formation of pirticlcs are encoded in ONA. A 
package comprising RNA encapsulated by proteins encoded 
by that RI.'A satisfies tr.e -ajcr criterion that the 
genetic message inside the package specifies something 
on the outside. Tne particles by ther.se Ives are net 
viable. After isolating the packages that carry an 
SBD, we would n-;ed to: 

l f separate the R:'A fro:?, the protein capsid, 

2) reverse transcribe the RNA into DNA, using AMV 
or I01TV reverse transcriptase, and 

3) use Themis n niMt:c: s DNA polymerase for 2 5 or 
norc cycles of roiyr.crase ch*in Reaction^ to 
amplify the DNA until there is enough to subclone 
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the recovered qer.cCic r.cssnge into a plasmid for \ : ' 
sequencing and further vorJc. 

I 

Alternatively, helper phage could be used to rescue the t 

isolated phage. In one of these '-ays ue can recover a t : 
sec'-J&nce that codes for an S3D having desirable binding 

properties. The j_n v : t ro amplification (SAIK85, *■' 

SCHA36, US Patents 4, 63 3, 202 and 4,683, 105). may be \. 

conveniently carried out using a P«rkin-E lmer/Cetus jf'_ 

Thermal Cycler (part ncr.be r N301-O150) and CeneAxip DNA fer 

Arpl i f ication Reagent Kit (NS01-00O) supplied by £. 

PerKin-Elir.er Corp., 761 Main Avenue, Norvalk, CT, f- 
06859-0012, USA. The pr im^r-s used in the Polymerase 

Chain Keact ion should be picKed so that the oso-y bd £* 

gene is the part of the reverse-transformed DfiA that is ^ ■ 

arvDlified. f 

K 

Although such a procedure is much more cumbersome \. 

than use of o;.'A phage, it may ce of interest if: l) the- \' 

genetic oack-age of tr.e P.N'A phage is r.uch more stable *' 

than any C:fA phage, 2) the 3D structure of an PJ.'A phage J... 

is known (f2 ferns crystals inside col i , suggesting [_ 

that structure dcterr.ir.a_ ion of C2 virion may be j. 

practical), or 3) folding of a larga protein inside n (. 

cell is desired (thir scheme alio-s almost the entire T 

3.5 Kb genome of K5 2 to be used for chimeric coat ;• 

prct.eir.-P2D) . Use cf fusions involving MS 2 coat | 

i- 

protein, together with wild-type coat protein, to r 

encapsulate genes dcr.or.st rates the most primitive \. 

system that could be employed in the present invention. > 

Although the system has certain technical J", 

inconveniences and there: jre is not preferred, it could v 

be used . * . 
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So 

Sec . 1.3.2: Pre ferret C*; to r-.^j r f.".ce Proteins fo r 

Displaying IPOPs on Phaor-s : ^ 

*?. 

For a given bacteriophage, the preferred OSP is £ 

usually one that is present on the phage surface in the 

largest number of copies, as this allows the greatest 

flexibility in varying the ratio o: 0SP-IP5D to -wild ^ 

type OSP and also gives the highest likelihood of 

obtaining satisfactory affinity separation. Moreover, ^ 

a protein present in only one or a few copies usually l ~ 

performs an essential function in morphogenesis or 

infection; mutating such a protein by addition or {. 

insertion is likely to result in reduction in viability f 

r 

of the CP. 

\' 

\: 



f. 
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It is preferred that the wild-type osp gene be 
preserved. The iobd gene fragment may be inserted 
either into a second copy of the recipient osp qene or 

into a novel engineered c so gene. It is preferred that f : j 

20 the ceo- jpbd gene be placed ur.c :? r control .of a y 

regulated promoter. Cur process forces the evolution r 

of the PBDs derived from IPDO so that some of them 

develop a novel function, v i z . binding to a chosen l 

target. Placing the gene that is subject to evolution ■ *;. ]J 

25 on a duplicate gene is en imitation of the widely- ;/* 'si 

accepted scenario for the evolution of protein ~i 

.families. It is now cer.erally accepted that gene • ) 

duplication is the first step in the evolution of one - 

protein from an ancestral protein. 9y having two ( 

JO copies of a gene, the affected physiological process 

can tolerate mutations in one of the genes. This ? 

process is well understood and documented for the 

globin family (cf^ DICK32, p65ff, and CREI34. pll7- |-~ 

125). t 

i. 



"rsro:..../ -j 
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The preferred OSP for use when the G? is M13 is 
the gene III protein (see Example 1) . 

Sec. 1.3.3: Choice of Insertion site for IPBO in OSP: 

The user must choose a site in the candidate G.SP 
gene for inserting a iobd cene fragment. The coats of 
most bacteriophage are nighty ordered- Filamentous 
phage can be described by a helical lattice; isorr.etric 
phage, by an icosahedral lattice. Each monomer of each 
major coat protein sits on a lattice point and makes 
defined interactions with each of its neighbors. 
Proteins that fit into the lattice by making some, but 
not all, of the normal lattice contacts are likely to 
destabilize the virion by: a) aborting formation of the 
virion, b) making the virion unstable, or c) leaving 
gaps in the virion so that the nucleic acid is not 
protected. Thus in bacteriophage, unlike the cases of 
bacteria and spores, it is important- to retain most or 
all of the residues of the parental 05? in engineered 
OSP-IPGD fusion proteins. 

Association of proteins into diners, trir.crs, or 
even larger structures represents yet another aspect of 
protein binding. For proteins that fern such 

associations, heterologous r.ii>:tures of mutant and 
nornal proteins will Corn if the nutations have net 
altered the interface bctvoen subunits. For example, 
Ward et al^ have shown that tyrosyl tRNA. synthetase 
will form heterodimers when mutant and normal protein 
are ailowed to refold togetner (W,\RD86). See also 
Hickman and Levy (HICKo3) who studied the raultimcric 
structures of the Tet R protein by engineering cells to 
carry two different ret alleles and observing a Tet^ 
phenotype arising from the complementary alleles. They 
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conclude that the Tet* protein is cult ineric. t- 

Immunoglobulin formation depends on the ability of V L £ 
domains and v H domains, each a part of a separately f 
synthesized protein, to associate independently of the 
protein sequence in the antigen complementarity- 
determining regions. In addition, the process of 
immune complementation depends on the separability of 
the binding properties of the complementarity- 
deteraining regions from the binding properties of the 
constant domains. 



Auditore-Hargreaves, US Patents 4,470,925 
(AUDIS4a) and 4,479, 395 (APDI34'o) teaches methods of 
making hybrid antibodies fiat depend on association of 
different antibody chains. These patents teach that 
alterations far fi'on the intermo lecular interface do 
not alter the association. 



I. 



A preferred site for insertion of" the i pbd gene f'. 

into th* pha-^e ceo qene is one in which: a) the IP3D £• 

folds into its original chape, b) the OSP domains told f' v 

into their original shapes, and c) there is no £ 

interference between the" two domains. It is net 

required that the IPuO and OS P domains have any f 

particular spatiai relationship; hence the process or 

this invention docs not require use of the method of US | 

Patent '692. T 

r ■ 

g 

If there is a 3 0 model of the phage that indicates ^ 

that either the amino or carboxy terminus of an OSP is £ • 

exposed to solvent, then the exposed terminus of that £ - 

nature OSP becomes the prime candidate for insertion cC r. 

the icbd gene. A low resolution -0 model suffices. {... 
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In the absence of a 2D structure, the amino and 
carboxy termini of the nature CS? are the best 
candidates, for insertion of the i pbd gene. A 
functional fusion may require additional residues 
5 between the IP3D and 03P domains to avoid unwanted 
interactions between the domains. Random- sequence DMA 
or ON* A codinq for a specific sequence of a protein 
homologous to the IPDD or C3? ( can be inserted between 
the osp fragnent and the i nbd fraq.-.ent if needed. 

10 

Fusion at a domain boundary within the OSP is also 
a good approach for obtainiig a functional fusion. 
Smith exploited such a boundary when subcloning 
heterologous ON' A into gene til of fl (SMITS5J . 

15 

There are several r.othods of identifying domains. 
Methods that rely on a.caic coordinates have brer, 
reviewed by Janin and Chechia (JAN'135) . These methods 
use matrices of distance:: bef-een alpha carbons 

20 ( c aloha) ' dividing pLancs (c.f. P.OSZB 5 ) , or buried 
surface ( RASH 3 -I) . Chcthia and collaborators have 
correlated the behavior cf r.any natural proteins with 
domain structure (according to their definition). 
P.ashin correctly predicted the stability of a domain 

25 ^comprising residues 206-316 of therr.oiysin (VITAB4, 
RASHS4) , 

Many researchers have used partial proteolysis and 
protein sequence analysis to isolate and identify 

3 0 stable domains. (Sec, for example, VITAS 4. P0TE3.1, 
SCOT3 7, and PAB07 9.) Pa bo ct aj._._ used calorimetry as 
an indicator that the cl repressor from the coLiphagc 
lambda contains two domains; they then used partial 
proteolysis to determine the location of the domain 

35 boundary. 



It is generally believed that the part of the 
polypeptide chain composing one domain folds almost 
independently of the parts conposing other domains. 
There are natural proteins composed- of two or more 
domains for which there is strong evidcr.ee that 
essentially the so::.c dona in occur: r.pre th-vn once, tor 
example ovor.-jcoids and cvo inh ib L to rs (CCOT3 7) and 
kallikrein (CiiL'tiSS) . Further, tne same domain can 
occur in several different proteins (SUDH0 5 , CILBaS, 
and SCOTS'/) . 

If the only structural information available is 
the amino acid sequence of the candidate 05P, ve can 
use the sequence to predict turns ana loops. There is 
a high probability that so.-re of the loop3 and turns 
will be correctly predicted f cf . Chou and fasmar., 
(CHOU72) ) ; these locations are also, candidates for 
insertion of the i ob d gene fragment. . 

Sec . 1.3.4: .1 r. Vivo Sfelocticn for Pscuco-OT/P ^:.ne t rcn 
R andom D:. A .'r . serrs in Rjctnriiil 5 nr. res: * 

Alternatively, a functional insertion sit*: .-ay be 
determined by generating a nur.fcor of rocc.r.b i nan- 
cons true t ions and selecting the fur.ct icr.a I strain by 
phenotypic characteristics. Boc.iuse the CSP-KQD must 
fulfill a structural role in the phage co*c, it is 
unlikely that any particular random OKA sequence 
coupled to the ipbd gone wiil produce a fusion protein 
that fits into the coat in a functional way. 
Nevertheless, random UNA inserted between large 
fragments of a coat protein gene and the i pfrn' gene will 
produce a population that is likely to contain one or 
more members that display the IPBD on the outside of a 
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viable phage. A display probe, sinilar to that defined 
in 1.1.4, is constructed and random DMA sequences 
cloned into appropriate sites. 



Sec. 



Choi co of TPHO 



A IP30 may be chosen fro:a naturally occurring 
proteins or donains of natural w occurring proteins, or 
10 nay be designed from first principles. A designed 
protein nay have advantages over natural proteins if: 
a) the designed protein is more stable, b) the designed 
protein is smaller, and c) the charge distribution of 
the designed protein can be specified more freely. 



A candidate IPBD oust neet the following criteria: 



1) a donrain exists that will regain stable under 
20 the conditions of its intended use (the domain r.ay 

comprise the entire protein that will be inserted, 
e.g. EPTI ) , 

2) knowledge of the air.ino acid sequence is 
25 obtainable, 

3) knowledge of the identity of the residues on 
the dentin's outer surface, and their spatial 
relationships, is obtainable, and 

30 

A) a .-olccule is available having specific and 
high affinity for the I PBD , Af,^(IPD0). 

Preferably, the IPBD :s no larger than necessary 
35 because it is easier to arrange restriction sites in 



smaller anino-acid - sequences and because a smaller 
protein minimizes the metabolic strain on the C? or the 
host of the CP. The usefulness of candidate IPBOs that 
meet all " of- these requirements depends on the 
5 availability cf the information discussed below. 

Information about candidate :?50s that -111 be 
used to judge the . suitability of the IPDD includes: 1) 
a 3D structure (knowledge strongly preferred), 2) one 

10 or core sequences homologous to the IPHD (the more 
homologous sequences known, the better), 3) the pi of 
the IPBD (knowledge necessary in seme cases), 4) the 
stability and solubility as a function of tecperature, 
pH and ionic strength (preferably fJio-n to be stable 

15 over a wide range and soluble in conditions cf intended 
use), 5) ability to -bind metal icr.s such as Ca 1 " 1 " or 
Mg** (knowledge preferred; binding per se, no 
preference), .6) enzymatic activities, if any (>:nc-ledce 
preferred, activity £er se has u-es but may cause 

20 problems), 7) binding properties, if any (kr.oviedge 
preferred, specific binding alio preferred), E) 
availability of a molecule having specific and strong 
affinity ( Kj < 10" 11 M) for the l?30 (preferred), 9) 
availability of a molecule having specific and medium 

25 affinity ( 10* 8 M < K d < 10"= M) for the IPBD 
(preferred) , 10) the sequence of a mutant of I?30 that 
does not bind to the affinity -olecule(s) (preferred), 
and 11) absorption spectrum in visible, UV, :;y.R, etc. 
(characteristic absorption preferred). 

30 

If only one species of molecule having affinity 
for IPDD ( A f M ( I PBD) ) is available, it will be used to: 
a) detect the IPDD on the CP surface, b) optimize 
expression level and density of the affinity molecule 
35 on the matrix (Sec. 10. 1) . and c) determine the 
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efficiency and sensitivity of the affinity separation 
(Sees. 10.2 and 10.3). As noted above, however, one 
would prefer to have available two species of 
AfM(IP30), one vith high and one with moderate affinity 
for the IPB0. The species with high affinity would be 
used in initial detection and in determining efficiency 
and sensitivity (10.2 and 10.3), and the species with 
moderate affinity would be used in optimization (10.1). 



There are nany candidate IPBDs, 20 or zore, for 
which all of the above information is available or is 
reasonably practical to obtain, for exanple, bovine 
pancreatic trypsin inhibitor (SFTI, '53 residues), 
crambin (-;6 residues), third docain of ovomucoid (56 
residues), T4 lysosyne (164 residues), and azurin (123 
residues) . Structural information can be obtained frcra 
X-ray or neutron diffraction studies, cher.ical 
cross linking or labeling, modeling f rcz known 
structures of related proteins, cr froa :*:v:-ore t ical 
calculations. 3D structural information obtained by X- 
my diffraction, neutron t diffraction cr NHR is 
preferred because these r.ethods allow localization of 
almost all of the atcr.s to within defined iir.its. 

Host of the P3Ds derived frc^ a PPBD according to 
the process of the present invention affect residues 
having side groups directed tcvard the solvent.. 
Reidhaar-Olson and Sauer (P.EI23S) found that exposed 
residues can accept a wide range of anino acids, while 
buried residues are nore linited in this regard. 
Surface r.utaticns typically have enly srr.all effects on 
r.clting ter.perature of the PBO, but nay reduce the 
stability of the PBD . Hence the chosen I?5D should 
have a high r.clting ter.perature (60°C acceptable, the 
higher the better) and be stable over a wide pH range 
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(3.0 to 3.0 acceptable; 11.0 to 2.0 preferred), so that 
the SDOs derived from the chosen IPBO by nutation and 
select ion-chrough-binding will retain sufficient 
stability. Preferably, the substitutions in the IPDD 
yielding the various PDDs do net reduce, the melting 
point of the domain belov 50°C. Mutations nay arise 
r.hat increase the suability of SLOs relative to Che 
IPBO, but the process of the present invention does not 
depend upon this occurring. 

Tvo general characteristics of the target 
nolecule, size and charge, make certain classes cc 
IPBOs nore likely than other classes to yield 
derivatives chat will bind specifically to the cargec. 
Because these are very general characteristics, one can 
divide all targets into six classes: a) large positive, 
b) large neutral, c) large negative, d) snail positive, 
e) srr.ali neutral, and f) snail negacive. A snail 
collection of IPBOs, one cr a few corresponding to each 
class of target, will contain a preferred candidate 
IPBD for any chosen target. 

Alternatively, the user rr.ay elect to engineer a 
C?(IPBDJ for a particular target; Sec 2.1 gives 
criteria th.^t relate target siie and charge to the 
choice of IPSO. 

Sec. 2.1.1: Influence of target si:e on choice of IPDO: 

If the target is a protein or other r.acrorcolecule 
a preferred embodiment of the IP2D is a s-all protein 
such as 8PTI cron Bos Taurus (5S residues), crar.bin 
from rape seed (-;G residues) , or the third domain of 
ovomucoid from Coturnix coturnix Jnuo nica (Japanese 
quail) (5G residues) ( PAPAS 2 ) , because targets froa 



77 



this class have clefts and grooves that can accommodate 
snail proteins in highly specific ways. If the target 
is a macronolecule lacking a compact structure, such as 
starch, it should be treated as if" it were a small 
molecule. Extended macromolecules with defined 3D 
structure, such as collagen, should be treated as large 
nol ecu 1 es . 

If the target is a small colecule, such as a 
steroid, a preferred embodiment of the IPDO is a 
protein the size of ribonuclease from Bos t aurus (124 
residues) , ribonuclease from Aspero i I lus cruzae (104 
residues), hen egg white lysozyne from G.t 1 1 us ga 1 lus 
(129 residues), azurin from Pseudo.-.onas acru^enosa (12S 
residues), or T4 lysozyne (164 residues), because such 
proteins have clefts and grooves into which the snail 
target molecules can fit. The flrcokhaven Protein Data 
Bank contains 3D structures for all of the proteins 
listed. Genes er.ccding proteins as lar=e as T4 
lysozyne can be manipulated by sta.-.card techniques for 
the purposes of this invention. 

If the target is a mineral, insoluble in water, 
" one r.ust consider tne nature of the molecular- surface 
of the mineral. Minerals that have smooth surfaces, 
such as crystalline silicon, require medium to large 
proteins, such as ribonuclease, as IPBD in order to 
have sufficient contact area and specificity. Minerals 
with rough, grooved surfaces, such as zeolites, could 
be bound either by snail proteins, such as SPTT, or 
larger proteins, such as T4 lysozyne. 

Sec. 2.1.2: Influence of target ch.irn e cn choice of 

IPPD: 
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Electrostatic repulsion between molecules of like 
charge can prevent molecules with highly complementary 
surfaces froa binding. Therefore, it is preferred 
that, under the conditions of intended use, the IPBD 
5 and the target molecule either have opposite charge or 
that one of thes is neutral. In soze cases it has been 
observed' that protein r.olocules bir.d in such a way that 
like charged groups are juxtaposed by including 
oppositely charged counter ions in the molecular 
10 interface. Thus, inclusion of counter ions can reduce 
or eliminate electrostatic repulsion ana the user may 
elect to include ions in the eluants used in the 
affinity separation step. Polyvalent ions are more 
effective at reducing repulsion than monovalent ions. 

15 

Sec. 2.1.3: Other considerations in the chaice of 
I PSD: 

If the chosen IP80 is an enzyme, it may be 
20 necessary to change one or more residues in the active 
site to inactivate enzyme function. For example, if 
the IPBD were T4 lysozyme. and the CP were col i cells 
or M13, ve would need to inactivate the lysozyr.e 
^because otherwise it would lyse the cells. If, on the 
25 other hand, the CP were Phi>.*174, then inactivation of 
lysozyne =ay not be needed because TA lysozyce can be 
overproduced inside' FL_ col i cells -without detrimental 
effects and FhiXl7< forr.s intracellular ly. It is 
preferred to inactivate enzyme I?3Ds that night be 
30 harmful to the CP or its host by substituting mutant 
anino acids at zr.e or more residues of the active site. 
It is permitted to vary one or -ore of the residues 
that were changed to abolish the original enzymatic 
activity of the IPBD. Those CPs that receive osp-cbd 
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genes encoding an active enzyme nay die, but the • 

majority of seouences will not be deleterious. k" 

■ ■ r 

E; 

If the binding protein is intended for therapeutic f 

use in humans or animals, the IPBD r.ay be chosen from K 

proteins native to the designated recipient to ainimize j 

the possibility of antigenic reactions. J 



Sec. 3: Choice cf OCV : 

The OCV is preferably snail, e.g., less than 10 
K3. ■ The size of the OCV affects the stability of the 



In the cases of bacterial cells and bacterial 
spores, the bacterial ch romosome could be used as the 
OCV. Plasnids are, hevover, preferred because genes cn 
plasnids arc nuch nore easily conscructed and nutated 
than are genes in the bacterial chro.?.osor.e. When 
bacteriophage arc Co be used, the csp- i p hd gene r.ust be 
inserted into the phage genoae. The synthetic c s p - i p Jed 
genes can be constructed in small vectors and 
transferred to the CP genome when complete . 



r. 

Si 
K 



OCV and its derivatives, and the copy number thereof. £ 

An OCV which is stable, even after insertion of at \- 

r 

15 least 1 leb D:.'A, is sought. A multicopy OCV is also of ; 
interest. rt is desirable that cassette mutagenesis be f- 
practical in the OCV; preferably, at lease ^5 [; 
restriction enzyr.es are available that do not curt the y" 
OCV. It is likewise desirable that s inq le-stranded 

\ 

20 mutagenesis be practical. Finally, the OCV preferably s 

carriers a selectable rarker. \. 

t 

If a suitable OCV dees net already exist, it r.ay t"' 
be engineered by manipulation of available vectors. * 
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Phage such as M13 do * not confer antibiotic 
resistance on the host so that one can not select for 
cells infected with IU3. An antibiotic resistance gene 
can be engineered into the M13 genome (HIMEBO) . More 
virulent phage, such as PhiXl74, ma See discernable 
plaques that can be picked, in which case a resistance 
gene is not essential; furthermore, there is no room in 
the PhiX17< virion to add any new genetic material. 
Inability to include an antibiotic resistance gene is a 
disadvantage because it limits the number of CPs that 
can be screened. 



It is preferrea that CP(IPBD) carry a selectable 
narker not carried by wtGP. It is also preferred that 
15 wtC? carry a selectable marker not carried by CP(IPBD). 

«;pr. «: Desi c nir.n the oso-iobd ceno insert: 



Having chosen a IPBD, a CP, a strategy for getting 
the IPEJO onto the C? surface, and a cloning vector, we 
now turn zo the design of a suitably regulated gene. 
In this. section, we design an amino acid sequence that 
will cause the IPSO to appear on the CP surface, when it 
-is expressed. This amino acid sequence r.ay determine 
the entire coding region of the csoziGbg gene, or it 
-ay contain only the iobd sequence adjoining 
restriction sites into which random ONA will be cloned 
(Sec. 6.2) . 

We will now consider the transcriptional 
regulation of the o «n-»nbd gene; the design of the ONA 
encoding of amino acid sequences; the organization of 
synthesis; the methods of DHA synthesis and 
purification; and the actual gene synthesis and 
cloning . 
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The actual gene may be: a) completely synthetic, 
b) a composite of natural and synthetic OKA, or c) a 
composite of natural DNA fragments. • The important 
5 point is that the nhd segment, derived from the i nbd 
segxont, be easily genetically manipulated in the ways 
described in Part III. A synthetic Igbd segment is 
preferred because it allows greatest control ovor 
placement of restriction sites. Prixers complementary 
10 to regions abutting the oro- inb d gene on its 3' flank 
and to parts of the oso- irbd gene that are not to be 
varied are needed for sequencing. 

Sec. <; . I C-?retic regulation of the oso-ir'bd gene: 

15 

1,'ov ue consider regulation of the osn- i cbd gene to 
enable modulation of expression. The two important 
questions are: a) hcv much OSP-IPBD dc we need on each 
CP, and b) hcv accurately must we regulate the amount? 

20 

The essential function of the affinity separation 
is to separate GPs that bear PHDs (derived from IP3D} 
having high affinity for the tarcet from CPs bearing 
?50s having lew affinity for the target. If the 

25 elution voiune of a GP depends on the num'cer of PBOs on 
the CP surface, then a CP bearing many PDDs with low 
affinity, CP(PB3 W ), might co-elute with a G? bearing 
fever PfiDs with high affinity, CP(PSD S ). Assume that 
both CP(PDD V ) and C?(?BD S ) bind to the column under 

20 seme condition, such as low salt. If a qradient of 
some solute, such as increasing salt, changes the 
conditions, then all weakly-binding PBOs will cease to 
bind before any strongly-binding PBDs cease to bind. 
Regulation of the o r.p-pbd gene must be such thAt all 

35 packages display sufficient PBD to effect a good 



32 



separation in Sec IS. If the amc.it of PHD/ CP had an U 

i- 

effect on the eluticn volune of the CP fron the \ 

affinity matrix, then we would need to regulate the [\ 

ucount of PBO/CP very accurately. The following £ 

5 analysis shows that there is no strong linear effect of 

IP30/CP on clution volar.e and assur.es only: a) that all 7 

t- 

CPs are the same size, b) that interactions between the j; 
PbCs and the affinity matrix dominate differential f- 
elution of CPs, c) that the system is at equilibrium, j-^ 
10 and d) that all PBOs on any one CP are identical. 

/*■ 

If Up identical PBDs on a CP each have access to 1L. 

target molecules, and each PBD has a free-energy of t 

binding to the target of delta C b , then the total free t r 

15 energy of binding is * : 

b 

delta G b toc = r: p * delta C 5 . f 

f 

t- 

Delta C b will be a function of several parameters of h 
20 the solvent, such as: I J concentration of iens. I) pH, i 
Z) temperature, 4) concentration of neutral salutes £ 
such as sucrose, glucose, cthiir.ol, etc. 5) specific f. 
ions, such as, calciun, acetate, benzoate, nicotinate, j- 
e tc. If conditions are altered during affinity * 

25 -separation so that delta C b approaches zero, dc-Lta if 
C b loc approaches zero No tines faster. As delta C _^ u c 1 I 
goes to or above zero, the packages will dissociate 
from the i.-mcbil ized target molecules and be eluted. 



3 0 CPs hearing more P3Ds have a sharper transition 

L* 

between bour.^ and unbound than packages with fewer of £ 
the sane PHDs. For equilibrium conditions, the aid- ) 
point of the transition is determined only by the r_ 
solution conditions that bring the individual \f 
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interactions to zero free-energy. The number of 
PBDs/CP determines the sharpness of the transition. 



£3 



It should also be noted that the number of PBDs/CP 
5 is usually influenced by phys io log ica 1 conditions so 
that a sample of genetically identical" CPM'QDJs nay 
contain" CPs having different numbers of POOs on the CP 
surface.- In a population of C?{vgpDO)s .each PBD 
sequence will appear on more that one CP, and the 

10 actual number of PBDs/CP will vary from CP to G? within 
sone range. within a variegated population of PBDs, 
let PBD^ be the PBO with maximum affinity for the 
target. If there is a linear effect of nuciber of 
PBDs/CP, then the CPs having the greatest number of 

15 PBD X will be most retarded on the column. When we 
culture the enriched population obtained either as an 
effluent from the column or as an inoculum of matrix 
naterial from the' column, . the CP(PBD X ) will be 
amplified and give rise to new CP(?BD x )s having varying 

20 numbers of P3D X /CP. Thus the affinity separation 
process of the present invention could tolerate a 
linear effect of number of PDOs/C? on the elution 
volume of the CP(PDD) unless strong binding to target 
fortuitously causes the PBD to be displayed cn the CP 

25 "only in low nunber. It is extremely uniiKely that ell 
PBDs that bind co the target will aiso be incapable of 
display in large amounts on the CP surface. 



r.: 

I 

£ - 

ft 



According to the above analysis, there is no 
30 linear effect on elution volume from the number of 
IPBDs/GP, hence need for highly accurate regulation of 
IPBD/CP is not anticipated. The analysis above assumes 
that GP(IPBD)s are in equilibrium between solution in 
buffer and bound to the affinity matrix. Rate of 
35 elution may be an important parameter in column 
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10 



affinity chi-oaatc.?; rophy . In batch oiution from in 
affinity matrix or dution from an affinity ulate, the 
tine that each buffer is in concoct with the affinity 
oatecial may be an important variable. The density of 
affinity molecules on the matrix is an important j^*.- 
variable in optimizing the affinity separation. 

Because the analysis above is qualitative, in Sec. 10 : 
ct the preferred etr.bcdirr.ent wc experimentally optimise: t - 

1) the density of IPOD on the CP surface. 2) the J^v. 
density of affinity molecules on the affinity matrix, ^} 
3) the initial ionic strength, 4) the eiution rate, and 

5) the quantity of CP/ (volume of matrix) to be loaded j^; 
on the column. - 

A number of promoters are known that can be | 
controlled by specific chemicals added to the culture l . " 

ncdium. For example, the lacUVQ promoter is inducer! if 
isoprcpylthiogalactoside is added to t.-e culture 



medium, for example, at between l.G uti and 10.0 m.M. 
20 Hereinafter, we use "X INDUCE" as a generic tern fcr * 
chemical Chat induces expression of a gene. 

I 

Transcriptional regulation of gene expression is 

best understood and most effective, so wc focus our r 

25 attention on the promoter. If transcription of the h * 

os o-in bd gene is controlled by the chemical XINOUCE, L 

then the numtor of oSP-IPBDs per CP increases for [. . 
increasing concentrations of XINDUCE until a fali-ofC 

in the number of viable packages is observed or until ^ . 

30 sufficient IPD0 is observed on the surface cf harvested t . 

CP(IPBD)s. The attributes that affect the maximum j-. 

number of CCP-I?l30s per CP are primarily structural in r ■ . 

nature. There may be stcric hindrance or ether »• 

unwanted interactions between IPSDs if 03P-IH3D is jj ; "* 

35 substituted for every wild-type OSP. Excessive levels ^ 
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of OSP-IPBD may also adversely affect the solubility or 
norphogenesis of the CP. For cellular and viral GPs, 
as few as five copies of a protein having affinity for 
another i mmob i i i z ed molecule have resulted in 
successful affinity separations ( FERE3 2a , FE.1Ea2b. and 
SM ITS 5 ) . 



Another consideration of promoter regulation is 
that it is useful later to know the range of regulation 
of the osn-iobd . (Sec. S) In particular, one should 
determine how nearly the absence of XINDUCE leads • to 
the absence of IPBO on the CP surface; a non-leaky 
promoter is preferred. r.'on-leakiness is useful: a) to 
show that affinity of GPfo so- iobd l.s for AfK(I?9D) is 
due to the osr>- iobri gene, and b) to allow growth of 
G?(o so-obd ) in the absence of XINDUCE if the_exprtessi.cn 
cf cjp-ph d is disadvantageous. The l acUVS promoter in 
conjunction- vich the LacI^ repressor is a preferred 
example . 

Sec. 4.2: s:iA ?cnu?nce dor-ign: 



The present invention is not limited to a single 
method of gene design. The following procedure is an 
25 example of cne method of gene design that fills the 
needs of the present invention. 

Having specified that the amount of IPBD/CP is to 
be experimentally optimized and that well-studied 
30 available regulatory mechanisms applied to ospj^ln^ 
gene are sufficient, -e now consider design of a DrfA 
sequence. If the amino-acid sequence of OSP-IP8D is a 
definite sequence, then the entire gene will be 
constructed (Sec. 6.1). If random DNA is to be fused 
35 to I rod , then a "display probe" is constructed first; 




tre random DNA is then inserted to ccr.plete the 
population o: pucative osd- t pbd genes (Sec. 6.2) from 
which a functional osp- inbd gene is identified by in 
vjvo selection or kindred techniques. 

5 

The osv i r v .' \ gene need not be synthesized in t oto ; 
parts of the gene nay be obtained frcn nature. One 
nay use any genetic engineering method to produce the 
correct gene fusion, so long as one can easily and 

10 accurately direct nutations to specific sites in the 
pbd ON A subsequence (Sec. 14. 1). In all of the -etheds 
of mutagenesis considered in the present invention, 
however, it is necessary that the DtJA sequence Cor the 
o s j- i pbd gene be different fron any other DNA in the 

15 CCV. The decree and nature of difference needed is 
determined by thd nethed of mutagenesis to be used in 
Sec. 14.1. I . the ccthod of mutagenesis is to be 
replacement c; subsequences coding for the P50 vit.h 
vcDNA, then the subsequences to be nutaceni.:ed =us: be 

20 bounded by restriction sites that are unique vith 
respect to the rest of the OCV. If sir.gle-st rantfed- 
oj.igonucleotid9-directed mutagenesis is to be used, 
then the UNA sequence of the subsequence cod in? fcr the 
must be unique with respect to the rest o: the 

2 5 CCV . 

The sc queries of regulatory parts of the gene are 
taken from the sequences of natural regulatory 
elements: a) promoters, b} Sh ine-Da i gar no sequences, 
30 and c) transcriptional terminators. Regulatory 
elements ecu id also be designed from kne-* lodge of 
consensus sequences of natural regulatory regions. The 
sequences of those regulatory elements are connected to 
the coding regions; restriction sites are also inserted 



in or adjacent to the regulatory regions to allow 
convenient manipulation. 



10 



The coding portions of genes to be synthesized are 
designed at the protein level and then encoded in DNA. 
The amino acid sequences are chosen to achieve various 
goals, including: a) dioplay of a IPSO on the surface 
of a GP, b) change of charge on a IP8D, and c) 
generation of a population of PBDs from which to select 
an SBD. The ambiguity in the genetic code is exploited 
to allow optimal placement of restriction sites and to 
create various distributions of aaino acids at 
variegated codons. 



15 



Sneci f ic 0:i 



rruence assignment: 



A computer program may be used to construct an 
ambiguous DNA sequence coding fcr an ami.-io-ac id 
sequence giver, by the user. That is, the DNA sequence 
contains cedes Cov all pcr.sible CNA sequences that 
produce the stated amino acid sequence. The codes used 
in the ambiguous DtiA are shown in Table 1. An esanpie 
of an ambiguous CNA sequence is given in Table 3. 



25 The ur.er supplies lists of restriction en2yr.es 

that: a) do net cut the OCV, and b) cut the CCV only 
once or twice. Kcr each cr.zyr.e the program reads: a) 
the name, b) the recognition sequence, c) the cutting 
pattern, and dj the nar.es of suppliers. The ar-bLguous- 

30 OKA sequence coding for the stated amino acid sequence 
is evnmined for places that recognition sites for any 
of the given enzyr.es could be created without altering 
the ami no-acid sequence. A master table of er.2y.1es 
could bo obtained from the catalogues of enzyme 
suppliers such as the suppliers listed in Table 4 or 
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other sources, such as Roberts' annual review of 
restriction enzymes in Nucleic Acids Research. 

Each "potential recognition site causes a record 
similar to the following to be written 

i. 

Hind IIl'-S,B,M, I ,N, P> Loc^O T 0 D-13 Pir-n Cut e I ect ive 
Protein seq : k - s - 1 - w 

aa <s : 3 4 5 I C Cut 4 1/6 

AOy TTrJTCC S'NJ.'NNMA ACCTTNNNHK3 ' 

AGC TT ' 3 ' NUNNNTTCCA ANKNSH5' 

ACC TTr 



possible Df.'A :AAr 
cutter : A 

result : AAA 



The top line identifies the enzyme, H i nd III in 
this example, and the supplier (throuqh codes given in 
Table 4); M Loc=9" indicates that recognition begins 
with nucleotide 0; »t=9 m indicates that the antisenr-e 
strand of DrfA is cut after base 9; 



(top) 



■Q--.13 M 

indicates that the sense strand (or -bottom strand, not 
shown except in the dsCNA on the riant) is cut between 
bases 13 and 14 (reading left to right). "Dir^n" 
indicates that recognition is "normal'" . LLLnsi I- 1 
recognizes palindromic sequences, as do nest 
restriction cnzyr.es. Sorr.c en:ynos have asyas-stric 
-recognition, however, and cut to one side; fcr those 
enzyrces, the recognition could be "notrr.al" or 
"reversed" depending on whether the cn.-.yne cuts to the 
right or left of the recognition site. Hare 
unambiguous stretches that require certain restrict ion 
sites are labeled as "obligatory"; those -that are 
elective are so labeled. 



The second and third lines show the afr.ino-ac:d 
3 5 sequence and residue nur.be rs for which this region n: 
ON'A codes. The notation "Cut ? 1/6" indicates that 
this is the first of six possible Hjjui IH sites. 
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The a:ibiguity of the ONA between the restriction 
sites is resolved frca the following considerations. 
If the given anino acid sequence occurs in the 
recipient crrinisn, and if the ONA sequence of" the gene 
in the orgar.isn is >:ncv:i, then, preferably, ve maximize 
the differences betvoen the engineered ar.d natural 
genes to nininize the potential for recnabi nation. In 
addition, the following code no are pcorly translated in 
£- coli and, therefore, are avoided if possible: 
cta(L), cga (R) , egg (R) , and agg (R). For other host 
species, different codon restrictions vould bo 
appropriate. Finally, long repeats of. any one base are 
pror.e to cutaticn and thus are avoided. Balancing 
these considerations, •-■c can design a DMA sequence. 



gee. 5.1: 



n^ne synthesis: 



25 



30 



35 



;icv ve consider -ays to divide the synthesis of 
the designed gene into r.ar.ageablo segments. The 
present invention is not United as to how a designed 
c:.'A sequence is divided for easy synthesis. The 
following procedure is an example o: he-- such, synthes l's 
night be can^ged. 

An estat>isned net hod is to synthesize both 
strands of the entire gene in overlapping segments cf 
20 to 50 nucleotides (n:s) (TMERS3J . Dclow ve provide 
an alternative r.ethsd that is more suitable for 
synthesis cf vgLK.'A. This .-not hod is slr.ilar to methods 
pu^li-jned by Oliphant ct al_._ (0LIP3C and OLIP87) and 
Ausubel et a_U (AL'SCST) . Our adaptation of this nethed 
differs fror. previous r.cthods in that wo: aj use two 
synthetic strands, and b) do not cut the extended Ol.'A 
in the middle. Our gc.\ls arc: a) to produce longer 



pieces of dsDNA Chan can be synthesized as ssOHA on 
commercial DNA synthesizers, and b) to produce strands 
complementary to single-stranded vgDNA. By using two 
- synthetic strands, we remove the requirement for a 
5 palindromic sequence at the 3' end. 



J 



DMA synthesizers can currently produce oligo-nts 
of lengths up to ICO nts in reasonable yield, M&NA = 
100. The parameters H w (the length of overlap needed 

10 to obtain efficient annealing) and N s (the number of 
spacer bases needed so that a restriction enzyme can 
cut near the end of blunt-ended dsDNA) are determined 
by DNA and enzyme chemistry. N w = 10 and N s =* 5 are 
reasonable values. Larger values of ff w and N" s are 

15 allowed but add to the length of ssDNA that rsust be 
synthesized and reduce the net length of dsDf<A that can 
be produced. 



20 



25 



Let 



be the actual length cf dsD:: 



At must be 



be 
no 



synthesized, including any spacers, 
greater than (2 KntiA ~ • Let 0 W be the nurJ=er of 
nts that the overlap window can deviate from center. 



Q w is never negative. It is' preferred that the two 
fragments be approximately the same length so that the 
amounts synthesized will be approximately equal. This 
preference may be. overridden by other considerations. 
The overall yield of dsD.'.'A is usually dominated .by the 
synthetic yield of the longer oligo-nt. 



Wc use the fo I lovi ng ■ procedure to generate dsDNA 
of lengths up to (2 Mqna " N w) nts through the use of 



Klencv fragment to extend synthetic ss OKA fragments 
that are r.ot nore than M D[]A nts long. when a piir of 
"long oligo-nts, conplesenta ry for W v nts at the.'.r 3' 
ends, are annealed there will be a free 3' hydroxyl and 
a long us CM A chain continuing in the 5' direction on 
either side, v/e will refer to this situation as a 5' 
superovcrri'ing . The procedure comprises: 

1) picking a non-pa 1 indromic subsequence of K v tc 
N w +4 nts near the center of the dsDS'A to be 
synthesized; this region is called the overlap 
( typically, U v is 10) , 

• 2) synthesizing a ss DMA molecule that conpriscs 
that part cf the anti-sense strand fron its 5' end 
up to and including the overlap, 

3) synthesizing a ss DNA noiecule that comprises 
that part of the ser.se strand from its 5' end up 
to and including the overlap, 

4) annealing the two synthetic strands that are 
cor.plcnent^ry throughout the cverlap region, and 

5} extending both supcrovcrhangs wit.h Klenow 
fragment and all four deoxyr.ucleotide 
tr iphosphatcs . 

Because ^d'IA * s noc rigidly fixed at 100, the 
current Units of 100 2 M D?!A - N v ) nts overall and 
100 in each fragment are not rigid, but can be exceeded 
by 5 or 10 nts- Going beyond the Units of 100 and 100 
vill lead to lover yields, but these may be acceptable 
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Restriction enzyr.es do rot cue well at sites 
closer than about five base pairs fror. the end of blunt 
ds DNA fragments (OMP37) . Therefore N s nts {with U s 
typically set to 5) of spacer arc added to ends that we 
5 intend to cut with a restriction enzyme. If the 
pi asm id is to be cut with a b I unt-cu tt i nq enzyme, then 
we do not udd any spacer to the corresponding end of 
the ds DNA fragment. 

10 To choose the optimum site of overlap for the 

olicjo-nt fragments, first consider the anti-sense 
strand of the DN'A to be synthesized, including any 
spacers at the ends, written (in upper case) fron 5' to 
3' and le ft- to-right . » . B . : The U w nt long overlap 

15 wir.dcw can never include bases that are to be 
variegated. . 3 . : The K v nt long overlap should not to 
palindromic Le-Jt single DtIA molecules prine thenselves. 
Place a N v nt long windew as close to the center of the 
anti-dense sequence as possible. Check to se-e wither 

2 0 one or more ccdons within the window can he changed to 
increase the CC content without: aj destroying a needed 
restriction site, b) changing amino acid sequence, or 
c) making the overlap reg ion ■ p«i i ind ro- 1 c . If possible, 
change seme AT base pairs to CC pairs. ti the CC 

25 content of the window is lees than c ,0't, slide the 
window right cr left as much as Q w nts to maximize the 
r.u-.ber of C's and C's inside the window, but without 
including any variegated bases. For each trial setting 
of the overlap window, maximize the CC content by 

30 silent ccrion changes, but do not destroy wonted' 
restriction sitci or r.akc the overlap palindromic. If 
the best netting «%till has less than b0\ CC, enlarge 
the window to t.* w *-2 nts and plaoc it within five nts of 
the center to obtain the maximum CC content. If 



enlarging tho window one 
CC content,. do so, but do 



or 
not 



two nts 
i nc I ud e 



■-rill increase the 
variegated bases. 



Underscore the anti-sense strand from the 5' end 
up to the right edge of the window. Write the 
compleoentary" sense sequence 3'-to-5' end 1 e t c- to- r ight 
and in lower case letters, under the anti-s-nse strand 
starting at the left edge of the window and continuing 
all the way to the right cr.d of the anti-sense strand. 

We will synthesize the underscored anti-sense 
strand and the part of the sense strand that we wrote. 
These two fragments, complementary over the length of 
the window of high CC content, are nixed in equimolar 
quantities and annealed. These fragments ere extended 
with Klenow fragment and all four dcoxynucleot icie 
triphosphates to produce ds blunt-ended DNA. This DNA 
can be out with appropriate restriction enzymes to 
produce the cohesive ends needed to ligate th- fragment 
to other CN'A. 



Sec. 5-2: Pr.'A svn r. hor. is or.-.i n 1 1 r i i IcjVtj, 



-r.thocs 



The: present invention 



not United 



particular nethod of C:.*A synthesis or construction. 
The- following procedures exemplify one way to a chic v. 
the joals of the present invention. 

ON A is synthesized on a Milligcn 75G0 DNA 
synthesizer (Milligcn, a division of Killipore 
Corporation. Cedford, MA) by standard procedures. 
Software to control the synthesizer and to keep records 
of each synthesis is supplied by Milligcn. 



The following reagents are supplied by Milligen: 



1) 2. lit lH-tetrazclc in ace toni tri 1 c , 

2) 3'* (v/v) d i chloroacctic acid in 
dichloromc thane , 

3) Acetic anhydride in 2,6 1 ut id ine/ace tcnit rile 
(1:1:8) , 

4) 6. St dimezhylaminopyridine' in ace ton i tr i le , 

5) 0.1M iodine in 2,6 

lutid ine/va te r/ te t rahydro £uran (8:8:84) , 

6) 3t (v/v) tr iethylamino in acetonitrile , 

7) DMT-dAdenos ine ( 3z) cyanoethy lphosphorara idite 

8) DMT-dCy t id ine ( Bz) cyanocthylphosphoraraidite 

9) DMT-dCuanos ine ( iBu ) cyar.ca chylphosphoranid ite 

10) CMT-dTtv/n id ir.ccyanoethyl phosphor am id ite 

11) Acetonitr i le , anhydrous 

Tetrazole and ace ton i t r i 1 e are stored over 
molecular sicver, to sequester water. 

20 Phor.phoran id i t c:- are dissolved in anhydrous 

acetonitril'c (Hilligen) at 0.1 g/r.l. All other 
acetcni t r i 1 e used in the syntheses is "Low-water 
Acetonitri lc" supplied by J. T. Baker Chenical Company 
' Ph Lllipsburg , :U ) . Synthesis columns containing 

25 "supports charged with an initial base for each of A, C, 
G , and T are obtained from Milligcn in tvo types, high- 
loading and low-loading. High-loading columns are used 
for syntheses of oligo-nts containing up to 6C bases 
and contain . betveon 35 and 70 nicromolcs of anidite/g 

30 of support. The exact amount "^ries from lot to let. 
[,ow- loading columns containing between < and 7 
microir.oics amidit.c/g support are used for syntneses of 
oligo-nts containing 60 bases or more. 
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The Milligcn 7500 has seven vials Iron which 
pnosphoramidites nay be taken. "orally, the first 
four contain A. C, T, and C The ctlier three vials 
tray contain unusual bases such as incsine or mixtures 
of bases, the co-called "dirty bottle". The standard 
software allows progranned nixing of t-fo, three, or 
four bases in oquir.olar quantities. 



When a synthesis is complete, the DNA is re-oved 
10 froD the support by incubating the supports in 1 ml of 
fresh 23-30"* aznonium hydroxide solution (EM Science, a 
division of EM Industries, Inc., Cherry Hill, NJ) for 
15 hours at 50 degrees C. The solucion is dried under 
vacuum and the DI-*A resuspended in 200 microliters of 
15 IirLC-gradc water (Baker-Analyzed Reagent tR) , J.T. 
fcakor Chemical Co.) and is purified by high-pressure 
liquid chromatography (HPLC) or PACE- 

With ]ov-Ioadi:ig supports, a *5 -base- long oligo-r.t 
20 is typically obtained At 1-2*. of theoretical yield, 
j . p. io ug; a 100-nase-long olico-nt is typically 
obtained in 0.5* of theoretical yield, L_<Lu 5 u< 5- wi " n 
high-loading supports, 1 r.g of a 20 -cane- lev.] oligc-n'o 
is typicall> obtained. 

25 

The present invention is not United tc any 
particular r.cthod of purifying r::A for genetic 
engineering. KPLC is used for ooth oliqs-r.ts and 
fragments of several kb. Alternatively, agarose gel 

20 electrophoresis and olectroelut ion on an liil device 
(International Biotechnologies, Inc.. New Haven, CT) is 
used to purify large dsDNA Craqr.cnts. For oligo-nts, 
FACE and o lect roelu t ion with an Epigone dev ice ■ ( Ep igeno 
Corp., Daltinore, t'.O) arc an alternative to HPLC. One 

35 alternative fcr O.'lA purification- is HPLC on a Waters 
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(division of y.iilipore Corporation) HPLC system using 
the GcnPak f ™J -FAX colur.n. A sar.ple of 100 picograms 
(pet) to 10 ug can be leaded and recovered in 101-80% 
yield. The recovery varies vith the. size end 

concentration of the ON A, and whether it is single or 
double stranded. A t.'A?5 column from Fharmacia (Sweden) 
is used to desalt DNA elutcd frora the CenPak column. 
After passage over the NAPS column, the ON A solution is 
vacuua desiccated. 

Sec. 6.1: Clcr.iP.g of K"cvn OS?-ic-bc csr.e into CCV: 

■ In this section, ve clone the osv- iobd gene or the 
display prefce that wc have designed. In the preferred 
nethod, the synthetic gene is constructed using 
plasraids that are trans i or red into bacterial cells by 
standard r.echocis ( MAN 1ST, p2 50) or slightly modified 
standard -ethers. Alternatively, DNA fragner.ts derived 
fron nature are cperably linked to other fragments of 
DN'A derived :rc- nature or to synthetic DNA fragments. 
In cost cases ^: the pro : e rrcd method, gene synthesis 
involves construction of a series of plasm ids 
containing larger and l^.roor seq.~er.is of tne complete 
gene. Each piasrnid that contains a no^ly added portion 
of the oso- : r\-:\ gene or . of the display probe is tested 
by restriction d iqest icr. . Plasnids having the expected 
restriction digestion pattern are sequenced in the 
region of tr.e latest alteration to confirm the 



If, for convenience, snail plastics vere used for 
gene synthesis, the complete osp- icbd gene or display 
probe is succloned into the OCV at this point. 
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spr. fi.2 Cloning o f n^ncicn DN'A fPccontial gss) Into 
Display Probe:, 



.* " -> 
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If random CNA and phenotypic selection or 
S screening are used to obtain a CP ( I F3D) , then ve clone 
random DNA into one of t.^ restriction sites t.".it was 
designed into the display probe. 
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The random DMA may be obtained in a variety of 
ways." Degenerate synthetic DMA is one possibility. 
Alternatively, pseudorandom DNA may be taken from 
nature. If, for example, an Sph I site (GCA7C/C) has 
been designed into the display probe at one end of the 
Ipbd fragment, then we would use Nla III (CATC/) to 
partially digest some 0::a that contains a wide variety 
of sequences, generating a vide variety fragments with 
CATC 3' overhangs. Preferably, the display prct-e is 
designed with different restriction sites at each end 
of the jpbd gene so that random C.'.'A can be cloned at 
either end at the user's discretion. The gencne of an 
.organism would be a suitable source of DNA vith high 
sequence diversity. 

A plasnid carrying t.l\c display probe is digested 
with the appropriate restriction er.zyoe and the 
fragmented, random D.'.'A is annealed and ligatod by 
standard methods. The ligated plas.-ids are used to 
transform cells th.it ar^. grown ar.d selected for 
expression of the antibiotic-resistance gene. Flasmid- 
bearing CPs arc then selccced for the d ispl ay-of- IPBD 
phcnoty'pe by the procedure given in Sec. 15 of the 
present invention using AfM(IPBD) as if it vere the 
target. Sec. 15 is designed to isolate CP(?3D)s that 
bind to a target from a large population that do not 
bind. Use of the procedure of Sec. 15 to isolate a 
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genetic construction that leads to the display of a 
single type of IPBD is different frora the designed use 
in one important way: any GP that displays" the 1PD0 
will bind tightly and CPs that do not display IPBD will 
5 not bind, hence any reasonable amount of AfH(IPBO) on 
che matrix will identify a successful clone. 

As an alternative to selecting CPCITBDJs through 
binding to an affinity column, we can isolate colonies 
10 or plaques and screen through use of one of the methods 
listed in Sec. 8 to identify clonal isolates that 
display IPUD on the CP outer surface. 



Sec . 



Harvest of CPs 



15 

After transforming cells with ligated cloning 
vectors, we first grow the CPs in non-selecr. ive 
conditions to allow expression of the antibiotic- 
resistance markers on the cloning vector. After a 
20 grow-out, we apply selective pressure to Kill 
•jnCransf ormed cells. 



CPs arc harvested by r.cthods appropriate to the CP 
at hand, generally, ccntrifugat ion to pellctize CPs and 
resuspension of the pellets in sterile medium (cells) 
or buffer (spores or phage). 



Sec. 3: 



Vr; r i fjr. iricn of 0U -plAY_ 



The harvested packages are now tested to determine 
whether the IPBD is present on the surface. Tn any 
tests of CPs for the presence of IPBD on the CP 
surface, any ions or cofactors known to be essential 
for the stability of I TDD or A f M ( I POD) must be included 
at appropriate levels. The tests can bo done: .1) by 
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affinity labeling, b) e n z y m a ti c a 1 1 y , c) 
spectrophotocetrically, d) by affinity separation, or 
e) by affinity precipitation. The AfM(IPnD) in this 
step is one picked to have strong affinity 
(preferably, K d < 10" 11 M J for the IPBO molecule and 
little or no affinity for the wtGP. For cxanple, if 
BPTI were the IPDD, trypsin, anhydrotrypsin, cr 
antibodies to BPTI could.be used as the AfM(BPTI) to 
test for the presence of BPTI. Anhydrotrypsin, a 
trypsin derivative with serine 195 converted to 
dehydroalani.ne, has no proteolytic activity but retains 
its affinity for BPTI (AKOH72 and HUBE77) . 
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• preferably, the presence of the IPBD on the 
surface of the CP is deaonstrated through the use of a 
soluble, labeled derivative of a AfM(IPDD) with high 
affinity for IPBD. The label could b*: a) a 
radioactive a tea such as 125 I, b) a chemical entity 
such as biotin, or 3) a fluorescent entity such as 
rhodanine or fluorescein. The labeled derivative of 
AfM(IPDO) is denoted as Af M ( I PBD) * . The preferred 
- procedure is: 

1) mix Af::(IPBD)« vith CPs that are to be tested 
for the presence of IPBD; conditions of nixing 
should favor binding of IPBD to AfM(IPnD)*, 
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3) quantitate the AfK<IPBD)* bound by CPs. 

Alternatively, if the IPBD has a known biochemical 
activity (enzynatic or inhibitory), its presence on the 
C? can be verified through this activity. For example, 
if the IPBD were 3?TI, then one could use the 
stoichiometric inactivaticn of trypsin not only to 
demonstrate the presence of BPTI, but also to 
quantitate the amount. 

If the. IPBD has strong, characteristic absorption 
bands in the visible or UV that are distinct from 
absorption by the vtGP, then another alternative for 
neasuring the IPBD displayed on the C? is a 
spectrophotometry measurement. For example, if IPBD 
vere azurin, the visible absorption could be used to 
identify GPs that display azurin. 

Another alternative is to label the CPs and 
measure the amount of label retained by immobilized 
AfM(IPBD). For example, the CPs could be grown with a 
radioactive precursor, such as ^ 2 ?i or 3 H-thynid ine , 
and the radioactivity retained by immobilized AfM(IPSD) 
measured. 

Another alternative is to use affinity 
chromatography; the ability of a CP bearing the IPBD to 
bind a natrix (cl Sec. 15.1) that supports a Af M ( XPDO) 
is measured by reference to the wcCH. 

Another alternative for detecting the presence of 
IPBD on the CP surface is affinity precipitation. 
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If random DMA has been used, then the procedures 
of Sec. 15 are used to obtain a clonal isolate that has 
the display-of-IPBO phenotype. Alternatively, clonal 
isolates ' may be screened for the display-of-IPBO 
5 phenotype. The tests of this step are applied to one 
or more, of these clonal isolates. 

If no isolates that bind to the affinity rr.olecuie 
are obtained we take corrective action as disclosed in 
10 Sec. 9. 

If one or nore or the tests above indicates tr.at 
the IPDD is displayed on the CP surface, ve verify that 
the binding of nolecules having known affinity for IPSO 
15 is due to' the chimeric osp-icbd gene through the use of 
standard genetic and biochemical techniques, such as: 

1) transferring the pso-io M into the parent 

CP to verify that osazlsfed confers binding. 
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2> deleting the oso-lr^d gene fro^ the isolate- G? 
to verify. that loss of oss-iobd causes loss o: 
binding, 

3) showing that binding of CPs to A f M { I PBDJ 
correlates with (XINDUCE] (in those cases that 
expression of o sn-inbd is controlled by 
(XINDUCE)), and 

4) showing that binding of CPs to AfM(TPDD) is 
specific to the innobiiised AfK(IPBD) and not to 
the support matrix. 
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Variation of: a) binding of CPs by soluble AtM ( IP3D) * , 
b) absorption caused by IPDD, and c) biochemical 
reactions of I PBD are linear in the amount of IPBD 
displayed. Presence of- IPDD on tlie CP surface is 
indicated by a strong correlation between [XIUDUCE] and 
the reactions th.^t are linear in the amount of IPBD. 
Leakiness of the promoter is not: likely to present 
problems of high background with assays that are linear 
in the amount of IPSD.. These experiments cay be 
quicker and easier than the genetic tests. 
Interpreting' the* effect of (XINDUCEJ on binding to a 
( Af M ( I PBD) > column, however, may be problematic unless 
the regulated promoter is completely repressed in the 
absence of ( X INDUCE J . The affinity retention of 
CP(IPQD)s is not linear in the nunoer of IFBDs/CP and 
there may be, for example, little phenotypic aifferer.ee 
between CPs bearing 5 IPSOs and CPs bearing 50 IFBCis. 
The demonstration that binding is to A CM (I PHD) ar.d the 
genetic tests are essential; the tests with YAUDUCZ are 
optional. 

We sequence the relevant i o'cd gene fragment frcn 
each of several clonal isolates to determine the 
construction . 

We establish the* maxir-u.-?. salt concentration and pH 
range for wnich the GP(IPDD) binds the chosen 
AfM(IP30) . This is preferably done by neasuring, as a 
function of salt concentration and pH, the retention of 
Af M ( I PBD) * on molecular sizing filters that pass 
Af M ( I PBD) • but not CP. 



If the I PBD is displayed on the outside of the CP, 
and if that display is clearly caused by the introduced 
35 osn-jpbd gene, we proceed to P.trt II, otherwise" we oust 
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analyze the result and adopt appropriate corrective 
measures . 

Sec. 9: ■Pnyfgctinn the Disol*Y Sy stem: 

If we have attempted to fuse an i nbd fragment to a 
natural or.p frag.-e:.t, nur options; are : 

1) pick a different fusion to the same p.s£ by 

a) using opposite end of Q"p, 

b) keeping more or fewer residues from osp. in 
the fusion; for example, in increments of 3 
or 4 residues, 

c) trying a known or predicted domain 
bounda ry , 

d) trying a predicted loop or turn position, 

2) pick a different osp , or 

3) switch to random DNA method. 

If ue have just tried the random DMA method 
fully, our options are : 



1) choose a different relationship between i£bd 
fragment and random DMA (iQbrJ first, random DNA 
second or v ice vo rsi ) , 

2) try a different degree of partial diqestion, a 
different enzyme for partial digestion, a 
different decree of shearing or a different source 
of natural DNA, or 

3) switch to the natural OGP method. 
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If oil reasonable OSPs of the current GP have been 
tried and the random DMA method has been tried, both 
without success, we pick a new CP. 

Sur.aarv of Part I: 

In Part I, wc have constructed a CP(IFBD). 
Although the tarqet material is not picked until Part 
III, we have already discussed the general properties 
of targets that influence the choice of IPBD. The user 
nay use the first CP(IPBO) as the starting point for 
design and construction of other CPs: CP(IPBDl), 
CP ( I P302 ) , etc. The different IPDDs. night differ in 
charge and size in such a way that, for any target, at 
least one of the CP(IP3D)s will be appropriate as a 
starting point to develop a protein that will bind to 
chat target. 

Part II 

Sec. 10.0: Affinity Sop -iration Moans: • 



In Part ,11 ve optimize an affinity separation 
system that will be used in Part III to enrich a 
25 population of CP(vgPB0)s fcr those CP(PBD)s that 
display PBOs with increased affinity for the target. 

Affinity chromatography is the preferred means, 
but FACS, electrophoresis, or other neans nay also be 
30 used. 

Sec. 10. V nnrinization of Affinity Chro.T.a tocraphy 

Seoara t ion : 
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For linear gradients, elution volume and eluant 
concentration are d i rect ly ■ re lated . Changes in eluant 
concentration cause CPs to elute from- the column. 
Elution volume, however, is more easily measured and 
specified. It is to be understood that the eluant 
concentration is the agent causing CP release and that 
an eluant concentration can be calculated from an 
elution volume and the specified gradient. 

Using a specified elution regime, we compare the 
elution volumes of CP(IPBD)s with the elution volumes 
of wtCP on affinity columns supporting AfM(IPBD). 
Comparisons are made at various: a) amounts of IP3D/CP, 

b) densities of Af M ( IPBDJ / (volume of matrix) (DoAMcM) , 

c) initial ionic strengths, d) elution rates, e) 
amounts of CP/ (volume of support), f) pHs, and g) 
temperatures, because these are the parameters nost 
likely to affect the sensitivity and efficiency of the 
separation. We then pick those conditions giving the 
best separation- 

We do not optimize pH or temperature; rather ve 
'record optimal values for the other parameters for one 
or more values of pH and temperature. The P H used must 
be within the range of pH for which C?(IPBD) bines the 
AfM(IPBD) that is being used in this step- The 
conditions of intended use. specified by the user (Sec. 
11), may include a specification of pH or temperature. 
If pH is specified, then pH will not be varied in 
eluting the column (Sec. 15.3). Decreasing pH may, 
however, be used to liberate bound CPs from the matrix. 
Similarly, if the intended use specifics a temperature, 
we will hold the affinity column at the specified 
temperature during elution, • but we might vary the 
temperature during recovery. If the intended use 
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specifies the pH or temperature, then we prefer that jr.;, 
the affinity separation be optimized for all other t - • 

paraneters at the specified pH and temperature. 

5 In the optinization devised in this step, we |> , 

preferably use a nolocule known to have coderate [: r 7 
alfinity for the IP3D (K d in the r.ingc 10" G M to icr« ^ ; 
M), for the following reason. When popui-jtiens of 

CP(vgPBD)s are fractionated, there will be roughly g£ 
10 three subpopuia t ions : a) those with no binding, b) 

those that have soae binding but can be washed off with - 
high salt or low pH. and c) those that bind very I. 
tightly and nust be rescued in situ. We- optimize the j. : 

parameters to separate (a) from (b) rather than (b) jq. 
15 tron (c). Let P3D U be a PBD having weak binding to the jV,.. 
target and PED S be a POD having strong binding. Higher 

UoAMoM night, for exanple. favor retention of CP(PBD U ) t ,; 

but also ooke it very difficult to elute viable 
CP(?B0 S ). We will op-:ini:c the affinity separation to 
20 retain 'cPlPBl^J rather tnan to allow release of 

CP(PB0 s ) because a tightly bound C?(?B0 s ) can be f.,-; 
rescued by ifl siiU growth. If we find that DoAMoH 
strongly affcc.s the elution volume, then in part III 

we nay reduce the ar.ount of target cn the affinity v. :■■ 

25 colunn when an SBO has been found with accurately y. 
* strong affinity (K d on the order of 10"' M) for the 
target. 



In case the promoter of the osp-jpM gene is not >gf z 
30 regulated by a chemical inducer, ve optimize DoA*oM, 

the elution rate, and the amount cf CP/voluce of ^| 
matrix. If thfe optimized affinity separation is 
acceptable, we proceed. If not. ve r.ust develop a p 
moans to alter the amount of IPSO pot CP. Anong CPs ^ 
35 considered in the present invention, this case could 
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arise only for spores because regu intab le promoters are 
available for ail other systems. 

If the amount of IPDD/sporc is too high, ve could 
engineer an operator site into the .oso-iobd gene. We 
choose the operator sequence such that a repressor 
sensitive to a srr.all diffusible inducer recognizee the 
operator. Alternatively, we could alter the Shine- 
Da'lgarno sequence to produce a lower homology with 
consensus Sh ine-Dalga mo sequences. If the amount of 
IPBD/spore is too low, we can introduce variability 
into the pror.otcr or. Sh ine-Da Iqa rno sequences and 
screen colonies for higher amounts of IPBD/spore. 

In this step, we neasure elution volumes of 
genetically pure GPs that elutc fron the affinity 
matrix as sharp bands that can be detected by UV 
aosorption. Alternatively, samples frcn effluent 
• f race ions . can be plated cn suitable -.odium (colls or 
spores) or on sensitive cells (phage) and colonies or 
plaques counted. 

Several values of IP3D/GP, DoAMoM, elution rates, 
initial ionic strengths, and loadings should be 
examined. Th*e following is only one of -.winy ways in 
which the affinity separation could be optimized. We 
anticipate that optimal values of IP3D/GP and DoAKoM 
will be correlated and therefore shculd be optimized 
together. The effects of initial ionic strength, 
elution rate, and amount of CP/(oatrix volume) are 
unlikely to be strongly correlated, and so they can be 
optimized independently. 



35 



For each set of parameters to be 
column is eluted in a specified manner. 



tested, the 
For example, 
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we nay use a regime called Elution Regime 1: a KCl 
gradient runs from lOrnil to maximum allowed for the 



CP(IPBD) viability in 



100 fractions of 0.05 



followed by 20 fractions of 0.05 V v at maximum allowed 
KCl; pH of the buffer is maintained at the specified 
value with a convenient buffer such as phosphate; Tris, 
or MOPS. Other elution regimes can be used; -what is 
important is that the conditions of this optimization 
be similar to the conditions th,:t are used in Part III 
for selection for binding to target (Sec. 15.3) and 
recovery of GPs from the chromatographic system (Sec. 
15.4) . 



When the osp-iobd gene is regulated by [XINDUCE], 
15 IP3D/C? can be controlled by varying [XINDO'CE}. 
Appropriate values of (XINDUCE ] depend on the identity 
of (XIN0UCE] and the promoter; if, for example, XINDUCE 
is isopropy 1th ioga lactoside (IPTG) and the promoter is 
lacUV5 . then [IPTC] = 0, 0.1 uM, 1.0 uM, 10.0 UM, 10C.0 
20 uM, and 1.0 cM would be appropriate levels to test. 
The range of variation of fXIWDUCE] is extended until 
an optimum is found or an acceptable level of 
expression is obtained. 



25 DoAMoM is varied fron the maximum that the matrix 

material can bind to 1< or 0.1% of this level in 
appropriate steps. We anticipate that the efficiency 
of separation will be a smooth function of DoAMoM so 
that it is appropriate to cover a wide range of values 

30 for . CoAMoM with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid . 



Several values of initial ionic strength are 
35 tested, such as 1.0 nM, 5.0 mM, 10.0 mM and 20.0 .t^M. 



20 



25 



30 



O 



110 

Low ionic scrcnq:. K . favors binding betveer. oppositely 
charged groups, but: could also cause CP to precipitate. 

The elution rate is varied, by successive factors 
5 of 1/2, from the .maximum attainable rate to 1/15 of 
this value. If the Lowest elution rate tested gives 
the best separation, we test lower elution rates until 
we find an opting or adequate separation. 

10 The goal of the optimization is to obtain a sharp 

transition between bound and unbound CPs , triggered by 
increasing salt or decreasing pH or a combination of 
both. . This optimization need be performed only: a) for 
each temperature to be used, b) for each pH to be -used, 

15 and c) when a new C?(IPDDJ is created. 



Sec, 10.2: 

a ration : 



^surina the sensitivi ty of af * ini_ty 



Once the, values of TPSD/GP, DoAtfoM, initial ionic 
strength, elution rate, and amount of C?/ (volume of 
affinity support) have been optimized, ve determine the 
sensitivity of the affinity separation (C sen3 ^) by the 
following procedure that measures the minimum quantity 
of GP{IPBD) that can be detected in the presence of a 
large excess of wtGP. The user chooses a number of 
separation cycles, denoted ^chrom* that will be 
performed before an enrichment is abandoned; 
preferably, N C hrom 13 in the range 6 to 10 and N chrori 
must be grenter than 4. Enr ichmen t can be terminated 
by isolation o: a desired GP(SBD) before H chron passes. 
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The measurement of sensitivity is significantly 
expedited if CP(IPBD) and wtGP carry different 
35 selectable markers because such markers allow easy 
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identification of colonies obtained by plating 
fractions obtained from the chromatography column. For 
example, if wtCP carries kanamycin resistance and 
GP(IPDD) carries ampicillin resistance, ve can plate 
fractions from a column on non-selective media suitable 
for the CP.* Transfer of colonies onto anpicillin- or 
kanamycin-containing media will determine the identity 
of each colony. 

Mixtures of GP(IPSD) and wtCP are prepared in the 
ratios of l:V lirn , where v liia ranges by an appropriate 
factor ( e.g. 1/10) over an appropriate range, typically 
10 11 through 10 4 . Large values of v liia are tested 
first; once a positive result is obtained for one value 
of v lim' no sraaller values of V lim need be tested. 
Each mixture is applied to a column supporting, ac the 
optimal DoAMoM, an AfM(IPQD) having -high affinity for 
IPBO and the" column is eluted by the specified elution 
regime, such as Elution Regime 1. The last fraction 
chat contains viable CPs and an inoculum of the column 
matrix material are cultured. It CP(IP5D) and wtCP 
have different selectable markers, then transfer onto 
selection, plates identifies each colony. If CP(XPBD) 
and wtCP have no selectable markers or the same 
selectable markers, then a nur.ber (e.g. 32) of CP 
clonal isolates are tested for presence of IP3D by the 
techniques discussed in Sec. 3. If IPBO is not 
detected on the surface of any of the isolated CPs, 
then CPs are pooled from: a) the last few (s^flj. 3 co 5 ' 
fractions that contain viable CPs, and b) an inoculum 
taken from the column matrix. The pooled CPs are 
cultured and passed over the same column and enriched 
for CP(IPBO) in the manner described. This process is 
repeated until M C hron Passes have been performed, or 
until the IPBO has been detected on the CPs. If 
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GW1P3D) is no" detected alter N cnrom passes, v, - ln is 
decreased and the process is repeated. 

Once a value for v ii ra *- s found that ailovs 
recovery of C?(IPBD)s, the factor by uhich v lin is 
varied is reduced and additional values are tested 
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is knov;n to within a factor of two. 



c ser.si G q uaLs tr "- e highest value of v ii n for which 
the user can recover CP(IPED) within ti cnrom passes. 
The number of chromatographic cycles (K cvc ) that were 
needed to isolate GP(IPRD) gives a rough estimate of 
c eff ; c eff ^ s approximately the K C y C th root of Vlira: 



c eff = (approx.) exp( log e ( V l ia ) /K cyc ) 

For example, if v iini '-ere 4.0 x 10 3 and three 
separation cycles were needed to isolate GP(IPED), then 
c eff = (approx.) 736. 
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Sec. 10.3: Meas-J ring the efficiency of separation : 

To determine C e £* nore accurately, we determine 
25 tne ratio of G? ( I F3D) /wtCP loaded onto an AiMdFBD) 
column that yields approximately equal amounts of 
GP(IPBO) and wtGP after clution. We prepare fixtures 
of GP(IPBD) and wtCP in ratios C? ( I PBDJ : wtCP :: 1:Q; we 
start Q at twenty tir.es the approximate C c f£ found in 
30 Sec. 10.2. A 1:Q mixture of CF(IPBD) and wtCP is 
applied to a AfM(IPBD) column and cluted by the 
specified eluticn regime, such as Elutiun Regine 1. A 
sample of the lar,t fraction that contains viable CPs is 
plated at a dilution that gives well separated colonies 
35 or plac/ies. The presence of IPBD or the csp- ipbd gene 
in each colony or plaque con be determined by a nur.ber 




cf standard methods, including: a) use of different 
selectable' markers, b) nitrocellulose filter lift of 
CPs and detection with AfM(IPBD)* (AUSU37) , or c) 
nitrocellulose filter lift of CPs and detection with 
radiolabeled Df'A that is complementary to the osp-icbd 
gene (AUSU37) . Let F be the fraction of CP(IPBD) 
colonics found in the last fraction containing viable 
CPs. When a Q is found such that .20 < F < .80, then 

Ceff » Q * F. 

If F < 0.2, then we reduce Q by an appropriate factor 
f e . q . 1/ 10 ) and repeat the procedure. If F > 0.8, then 
ve increase Q by an appropriate factor ( e.g. 2J and 
repeat the procedure. 

Sec, m.-i: Other Separation .'^eans 

Other separation means are optinized in a manner 
pari! lei to the used for affinity chromatography. 

FACS is likely to be most appropriate for 
bacterial cells and spores because the sensitivity of 
the machines requires approximately 1000 molecules of 
fluorescent label bound to each CP to accomplish a 
separation. An appropriate commercial FACS nachine is 
a FACStar from Beckton-Dickinson, Mountain View, CA. 
To optimize FACS separation of GPs, ve use a derivative 
of Afm(IPD0)A that is labeled with a fluorescent 
molecule, denoted Afm (IPBD)*. The variables that must 
be optimized include: a) amount of IPBD/CP, b) 
concentration oZ Afm ( IPDD) * , c) ionic strength, d) 
concentration of CPs , and e) parameters pertaining to 
operation of the FACS machine. Because Afm(IPBD)* and 
CPs interact in solution, the binding will be linear in 
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both I A f m( I P3D) * ] and (displayed rr-BU].- Preferably, 
these two parameters are varied together. The other 
parameters can be optimized independently. The 
sensitivity and. efficiency of the FAC3 separation are 
.5 .. determined „in. a., manner parallel .to.;, those used for 
chromatography. 



Electrophoresis is nest appropriate to 
bacteriophaqe because of their snail' size. Server 
(SERW87) has reviewed use of agarcse-gel 
electrophoresis to separate phage based on charge. 
Electrophoresis is a preferred separation means if the 
target is so small that chemically attaching it to a 
column or to a fluorescent label would essentially 
change the entire target. For example, chloroace ta te 
ions contain only seven atoms and would be essentially 
altered by any linkage. CPs that bind chlcrcace ta te 
would become more negatively charged than CPs that .do 
not bind the ion and. so these classes of CPs could be 
separated. 
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The parameters to optimize for electrophoresis 
include: a) IPBD/CP, b} concentration of gel material, 
e.g. agarose, c) concentration of Afm ( I ?BD} , d) ionic 
strength, e) size, shape, and cooling capacity of the 
electrophoresis apparatus, f) voltages and currents, 
and f) concentration of CPs. Preferably, IPBD/GP and 
[Afn(IPBD)] are varied at the same time and other 
parameters are optimized independently. 

In Part II wc have dctcrmi.-d optimal conditiens 
for separating CPs based on proteins displayed on the, 
GP surface. We have also determined the capabilities 
of the affinity separation system. Knowledge of these 
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capabilities allous us to cJiooae appropriate levels of 
variegation in Part III. 

Part II I 

5 

Snc. 11.0: Choice of tarnct natcrial : 

Any .material r.ay be cheson as tarqet -material , 
subject only to the follo-ing restrictions: 

10 ' 

If affinity chromatography is to be used, then: 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 

15 applied to a solid support suitable for affinity 

separation, 

2) After application to a matrix, the target 
material r.ust' not react vith vater, 

;o 

3) after application to a matrix, the target 
material must not bind or degrade p rote i- ns ir> a 
non-specific '-ay, and 

2*5 4) the molecules of the target material must be 

sufficiently large that attacning the material to 
a matrix allows enough unaltered surface area 
(generally at least 500 H 2 , excluding the atom 
that is connected to the linker) fcr protein 

3 0 binding. 

If .FACS to be used as the affinity separation 
means, then: 
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1) the r.oleculos cf the target nat'jriai nust to of 
sufficient size and chcnicrfl reactivity to tc 
conjugated tc a suitable fluorescent dye or the 
target must itr.olf be fluorescent, 

2) after any nocessary fluorescent labeling, the 
target nust net react with water, 

3) after any necessary fluorescent labeling, the 
target material nust not bind or degrade proteins 
in a non-specific way, and 
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4) the molecules of the target material nust be 
sufficiently large that attaching the material to 
a suitable dye allows enough unaltered surface 
area (generally at least 500 A- , excluding the 
atom that is connected to the linker) for protein 
b ir.d ing . 

If affinity electrophoresis is to be used, then: 

1) the target -use eithor be charged or of such a 
nature that its binding to a protein will change 
the charge of the protein, 

2) the target notorial must net react with water, 

3) the target material r-ust net bind or deqrade 
proteins in a non-specific way, and 

4) the target must be compatible with a suitable 
gel material . 




Possible target material include, but are not 
35 United to: 
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ho rso heart syoglebin 

v, ***** — ^ 

4) yeast phenylalanyl c*u 

5) asbestos 

7) proteins 
a) io- density lipeprccc.n 
„ prostaglandin FCE2 
alpha interferon 

* elitCin = « ,cUnylate cyclase toxin 

aflatoxin Bj 
aspartame 

15) haen 

16 ) bilirubin 
corphine 

codeine 

dichlorodiphcn', 
benzo(a)pyrene 

actinomycin 0 • 
any retroviral protease 
any retroviral ^ pr«.». 

^ •« iu ":r 0 ; n ie ^ 

MMBawM P"" pr °"" £rcB any of 
fibC il or tu,cli.r prote! fr 
seV eral spirochete .acter.al ...c ^ 
org anisns causing sych.U-. Ly 

reia r i ;: l r:::cr 3 to X in ^ , n 

^Saa. anmoiJBtt 

zeolites 



10) 

11) 

12) 
13) 
14) 
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10) 

20) 

21) 

22) 

2 3) 

2-) 

15) 

26) 

27) 
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33) hydroxy lapati to 

34) GfJ A of a defined sequence 

35) fibrin 

36) tumor necrosis factor 

5 37) specific monoclonal antibodies 

A supply of several milligrams of pure target 
material is desired. Ir.pure target material could be 
u:td, but one might obtain a protein that binds to a 
10 contaminant instead of to the target. 

The following information about the target 
material is highly desirable: 

15 1) stability as a function of temperature, pH, and 

ionic strenqth, 

2) stability with respect to chaotropes such an 
urea or guanidin'.un Cl, 

20 

3) pr, 

4) molecular weight, 

25 5) requirements for prosthetic groups cr ions, 

such- as haen or Ca* 2 , and 

6) proteolytic activity, if any. 

30 

In addition to this mc^t desirable in format ion , it 
is useful to know: 1) the tarqet's sequence, if the 
target is a macromo lecu le , 2) the 3D structure of the 
target, 3) enzymatic activity, if any, and A) toxicity, 
35 if any.. 
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The user of the present invention specifics 
certain parameters of the intended use of the binding 
protein: 

1) the acceptable temperature range, 

2) the acceptable pH range, 

3) the acceptable concentrations of ions and 
neutral solutes, 

4) the naximun acceptable dissociation constant 
for the target and the S30: 

K T = (Target] [SBDJ/fTarget : S=>0 j 

In so.-c cases, the user nay require discrimination 
bef-een T, the target, and N, seme non-target. Let 

K T - ;7j (SSD]/[T:SD0] . and 
K N = iUJ[S3D]/[M:SBD] , 

then K T /K (I = ( [ T } ( U : SED ] ) / ( { U) [T : SBO | ) . 

The user than specifics a naxiaua acceptable value for 
the ratio K T /K rI . 

The target material must be stable under the 
specified conditions of pH, temperature, and solution 
condit ion-j . 



If the target material is a protaase, one must 
consider the following points: 
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1J a highly specific proteose can bo treated like 
an/ other target, 

2) a general protccse, such as subtil isin, nay 
5 degrade the OSPs of the CP including OSP-PBDs; 

there' are several alternative ways of dealing with 
general proteases, inc 1 u'J i r.g : a) ^ chemical 
inhibitor may be used to prevent proteolysis (o.jj^ 
phenylmethyl f luorcsul f ate (PKFS) th;tt inhibits 

10 serine proteases) , b) one or more active-site 

residues nay be mutated to create an inactive 
protein ( e _ . g . a serine protease in which the 
active serine is mutated to alanine), or c) one or 
more active-site am ir.o-uc ids of the protein nay be 

15 chemically modified to destroy the catalytic 

activity f e . n . a serine proteose in which the 
active serine is converted to anhy-d re serine) , 

3) S CDs selected for binding to c protease need 
20 not be inhibitors; SGDs that happen to inhibit 

the protease target are a fairly s.^a 1 1 subset of 
SBDs that bind to the protease target, 

4) the mcre.vc .-.edify the target protease, the 

2 5 ' less lifce we arc to obtain an S90 that inhibits 
the target protease, and 

5) if the user requires that the SU2 inhibit the 
target protease, then the ective cite of the 

30 target protease must not be nodifici any rr.ore than 

necessary; i nac t iva t icr. by nutation or chenical 
modification are preferred methods ot inactivation 
and a protein protease inhibitor becor.es a prime 
candidate for IPRD. For orample, Bi-TI could be 

35 nutated, by thy methods of the present invention. 
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r 0 bind to proccasos other than trypsin 
and TSCHS7 J . 
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, i^.O: Choice of i:Pf; P3D) : 

The user must pic>: a CP(IPBO) that is suitable to 
the chosen target ac-criinq no the criteria of Sec. 2. 
It is anticipated that a small collection of a 
CP(IPDD)s can be a^cmbLed such that, for any chosen 
target, at least one member of the collection vill be a 
suitable starting point fcr engineering a protein that 
binds to the chosen target by the methods of the 
present invention. 
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If tha pH, temperature, or other parameters of the 
intended use of the selected SDD differ markedly from 
the conditions used 'to optimize the affinity separation 
for the chosen CP(IPBC). then the user should optimize 
the affinity separation for conditions appropriate to 
the intended use by the methods described in Part II. 

Pec. 13.0: Identi f ication of Fa mily of PBSs, Sedated 
to PPDD. • to Be Goner* ted 



Sec. 13.1: Chang in g rr<= (dues on IPBD fo r other PP3D) 
to vary: 



We chocse residues in the IP30 to vary through 
consideration of several factors, including: a) the 3D 
structure of the IP00, b) sequences homologous to IPBD, 
and c) modeling of the IPDD and mutants of the IP3D. 
Because the number o: residues that could strongly 
influence binding is alvuys greater than the number 
that can be varied simultaneously, the user must pick a 
subset of those residues to vary at one time. The user 
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must also pick trial Uvols of variegation and 
calculate the abundances of various sequences. The 
list of varied residues and the level of variegation at 
each varied residue are adjusted until the composite 
5 variegation is commensurate with C sensi and M ntv . 

we now consider the principles that guide our 
choice of residues of the IPOO to vary. A Key concept 
is that only structured proteins exhibit specific 

10 binding, cm bind to a particular chemical entity 

to the exclusion ef most otners. Thus the residues to 
be varied are chosen with an eye to preserving the 
underlying I?BD structure. Substitutions that prever.t 
the PBD frea folding will cause CPs carrying those 

15 genes to bind indiscriminately so that they can easily 
be removed from the population. 

Burial of hydrophobic surfaces so .that bulk water 
is excluded is one of the strongest forces driving the 
20 binding of proteins to other molecules. BulSc water can 
be excluded fro:, the region between two molecules only 
if the surfaces are complementary. Kc. .must test as 
many surfaces as possible to find one that is 
complementary to the target. The selection-throunn- 

25 binding isolates cho-J* proteins that are more nearly 
ccmolementary to some surface on the target. The 
effective diversity of a variegated population is 
measured bv the n.nb« of different surfaces, rather 
than the number of prctem sequences. Thus « should 

30 maximize the number of surfaces generated in our 
population, rather than the number of protein 
sequences . 

in hypothetical exar.ple I. we consider a 
35 hypothetical PBD , shown in Figure 4 binding to a 
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hypothetical target. figure 4 is a 2D schematic o£ 30 
objects; by hypothesis, residues 1, 2, 4, 6, 7, 13, 1'., 
15, 20, 21, 22, 27, 29, 31, 33, 34, 36, 37, 3S, and 39 
of the IPSD arc on the 3D surface of Che IPBD, even 
though shown well inside the circle. Proteins do not 
have distinct, countable faces. Therefore we define an 
"interaction sc;" to be a set of residues such that all 
members of the set can s irr.ui caneous ly touch one 
ciolecule of Che target nacerial without -any aeon of the 
target coning closer than van der Waals distance to any 
main-chain acorn of Che IP3D. The concept of a residue 
"touching" a ,-oleculc cf the carget is discussed below. 
One hypcthecical interaction set. Set A, in figure a 
comprises residues 6, 7 , 20 , 21, 22, 3 3, and 34. 
represented by squares. Another hypothetical 

interaction sec. Set B, comprises residues 1, 2, 4, 6, 
31, 37, and 39, represented by circles. 

If ve vary one residue. nur.fcer 21 for example, 
through all twenty ar.ir.o acids, ve obtain 20 pre t-.- in 
sequences and 1*0 different surfaces for interaction sec 
A. Note that residue 6 is in two interaction sees and 
variation cf residue 6 through all 20 ?.nir.o acids 
yields 20' versions of interaction set A and 20 versions 
of interaction set 0. 

Now cor.sinor varying two residues, each through 
all twenty ar.ino acids, generating 400 prctcin 
sequences. If the two residues varied were, for 
exanple, nur.be r 1 and nurr.be r 21, then Che re would be 
only 40 different surfaces because interaction set A 
dees not depend on residue 1 and ir.teracc ion set B does 
not depend on residue 21. If the two residues varied, 
however, were nur.be r 7 and nur.be r 21, t*-:n <00 srr faces 
would be generated. 
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If N spatially soparaccd residues arc varied at 
one time, 20 x U surfaces are generated. Variation of 
N residues in the sane interaction set yields 20" 
5 surfaces. Fcr example, if N = 7, variation of 
separated residues yields 1-0 surfaces while variation 
of interacting residues yields 20 7 ^ 6.; x 10° 
surfaces. Thus , to mux im i 2-2 t.ne nunber nf suri'-ices 
generated when S* residues are varied, all residues 
10 should be in the sane interaction set because variation 
of several residues in cr.e interaction set generates an 
exponential number of surfaces while variation of 
spatially separated surface residues generates only a 
linear nur.be r. 

15 

The amount of surface area buried in strong 
protein-protein interactions ranges from 1000 \ 2 to 
2000 A 2 , as summarized by Schulz and Schirmer (GCHU79, 
p!03ff). Individual amino acids have total sui race 

2 o areas that depend mostly on type of ammo acid and 
•-eaMly on ccnforr.ati.on. These areas rar.ge from about 
150 A 2 Cor glycine to about .160 A 2 for tryptophan. 
Averages, of total surface area by amino acid type and 
maximum exposed suriace area of each amino acid type 

2 5 „ fur two typical proteins, hen egg white lysozymc (HEWLJ 
and T4 lysozyme (T4L). are shown in Table 6. from 
these exposures, one can calculate that 1C00 A 2 cn a 
protein surface comprises between -; and 30 amino acids, 
depending on the amino acid types and- the protein 3D 

30 structure. Varied ar.ino ocid sequences, as found in 
actual proteins, involve between 10 and 25 residues in 
forming 1000 A- of protein surface. Schulz and 
Gchirmcr estimate that 100 A 2 of protein suriace can 
exhibit as many as lo00 different specific patterns 

35 (SCHU79, pl05) . The nu.-ber of surface patterns rises 
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exponentially with the area that can he varied 
independently. One of the BPTI structures recorded in 
the Brookhaven Protein Data Bank (6PTI), for example, 
has a total exposed surface area of 3997 A 2 fusing the 
method of Lee and Richards (LEE:B7i) and a solvent 
radius of "1 . 4 A and atonic radii as shown in Table 7). 
If u e could vary this surface freely and if .100 A 2 can 
produce 1000 patterns, wo could' construct 10 120 
different patterns by varying the surface of BPTI! 
This calculation is intended only to suggest the huge 
number of possible surface patterns based on a coupon 
protein backbone. 
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One protein framework cannot, however, display all 
possible patterns over any one particular 100 A 2 of 
surface merely by replacement of the side groups of 
surface residues. The protein backbone holds the 
varied side groups in approx ina te ly constant locations 
so that the variations are not independent. We can, 
nevertheless, generate a vast collection o: different 
protein surfaces by varying those protein residues that 
face the outside of the protein. 
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Figure 5 shows 3PTI in contact with myoglobin. 

25 -From this we can sec that residues 3, 7, 6, 10, 13, 39, 
41, and 4 2 can all simultaneously contact a nolecule 
the size and shape of myoglobin. Figure 5 also shows 
that residue 49 can not touch a single myoglobin 
molecule simultaneously with any of the first set even 

30 though all are on the surface of BPTI. It is not the 
intent of the present invention, however, to use models 
to determine which part of the target molecule will 
actually be the site of binding by PBD. 
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If cassette mutagenos is is picked, the protein 
residues to be varied are, preferably, close enough 
together in sequence that the variegated DNA (vgONA) 
encoding all of them can be made in one piece. The 
present invention is not limited to a particular length 
of vgDNA that can bo synthes i zee' , With current 
technology, a stretch of 60 amino acids (180 DNA bases} 
can be spanned. 

Further, when there is reason to mutate residues 
further than sixty residues apart, one can use other 
mutational r. eans, such as single-stranded- 
oligonucleot ide-d i rented mutagenesis (DOTS85) using tvo 
or more mutating primers. 

Altcrnat ive Ly , to- vary residues separated by cere 
than 'sixty residues, tvo cassettes may be mutated as 
follows: 

1) vg D:.'A having a low level of variegation (tor 
example, 20 to 400 fold variegation) is introduced 
into one cassette in the OCV, 

2) cells are transformed and cultured, 

3) vg OCV DJIA is obtained, 

4) a second segment of vgD.'-'A is inserted into a 
second cassette in the OCV, and 

5) cells are transformed and cultured, CPs are 
harvested and subjected to selection-through- 
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The composite level of variation nust not exceed the 
prevailing capabilities to a, produce very largo 
nuabers of independently transformed cells or b) detect 
snail components in a highly varied population. The 
Units on the level of variegation are discussed in 
Sec. 13.2. 

I.ere ve assemble the data about the IPeo and the 
target that arc useful in deciding which residues to 
vary in the variegation cycle: 

1) 3D structure, or at least a list of residues on 
-ho surface of the I r BD , 

2) list of sequences homologous to IP0O. and 

3) node! cf the target r.olccule or a stand-in for 
the target. 
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These data and an understanding of the 
different amino acids in proteins will 
answer tyo questions: 



ior of 
used to 
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1) which residues cf the IPSO are on the outside 
and close enough together in space to touch the 
target simultaneously? 

2, uhicn residues of the IPDO can be varied with 
high probability of retaining the underlying I?B0 
structure? 

Although an atonic r.cdel of the target material 
(obtained through X-ray crystallography, NMR, or other 
5) is preferred in such examination, it is not 
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necessary. For example, if the target were a protein 
of unknown 3D structure, it would be sufficient to know 
the molecular weight of the protein and whether it were 
a soluble globular protein, a fibrous protein, or a 
5 membrane protein. Physical measurements, such as low- 
ar.gle neutron diffraction, can determine, the overall 
molecular shape, y_^z. the ratios of the principal 
moments of inertia. One can then choose a prctein or 
known structure or the sa-e class and similar size and 

10 shape to use as a molecular stand-in and yardstick. It 
is not essential to measure the moments of inertia of 
the target beca\. at low resolution, all proteins of 
a given size ar.d class look much the same. The 
specific volumes are the same, all are more or less 

15 spherical and therefore ill proteins of the same siz<; 
and class have about th* same radius of curvature. The 
radii of curvature of the two molecules determine how 
much of the "two molecules can come into contact. 

20 Several graphical ar.d computational tools that are 

needed or useful. The -est appropriate method of 
picking the residues of the protein chain at which the 
amino acids should be varied is by viewing, with 
interactive computer graphics, a model of the I?5D. A 

25 stick-figure representation of molecules is preferred. 
A suitable set of hardware is an Evans d Sutherland 
PS390 graph\cs terminal (Evans A Sutherland 
Corporation, Salt Lake City, UT) and a MicroVAX II 
supermicro corputer (Digital Equipment Corp., Mayn.ard, 

30 MA). The computer shculd, preferably, have at least 
150 megabytes of disk storage, so that tho Brockhaven 
Protein Data Bank can fce kept on line. A FCSTP-AN 
compiler, or sc.v.c equally good higher- level language 
processor is preferred for prograr development. 

35 Suitable puograr.s for viewing and manipulating protein 
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models include: a) PS-FR3CO, written by T. A. Jones 
(JONE95) and distribute:! by the Siocheraistry Department 
of Rice University, Houston, TX; and b) PROTEUS , 
developed by Dayringcr. Tranancano, and Fletterick 
5 (DAYR36). Important features of PS-FR00O and PROTEUS 
that are needed to view and manipulate protein' models 
for the purposes of the present invention are the 
abilities to: 1) display molecular stick figures of 
proteins and other coleculcs, 2) zoom and clip images 

10 in real tine, 3) prepare various abstract 
representations of the molecules, such as a line 
joining C alpnd s and si^e group atoms, 4) compute and 
display solvent-accessible surfaces reasonably quickly, 
5) point to and identify atom:;, and 6) measure distance 

15 between atoms. 

•In addition, one could use theoretical 
calculations, such as dynamic simulations of proteins, 
to estimate whether a substitution at a particular 
10 residue of a particular a.-.ino-acid type might produce a 
protein of approximately the same 30 structure as the 
parent protein. Such calculations night also indicate 
whether a- particular substitution will greatly affect 
,the flexibility of the protein; calculations of this 
25 sort may be useful but arc not required. 

Sec. 13.1 . 1 : The p r inc in* 1 set:. 
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In this section ve pi-k a principal set of 
residues of the I ['ISO to vary. Using the knowledge of 
which residues are on the surface of the IPBO (as noted 
above), we pick residues that arc close enough together 
on the surface ot the IP3D to touch a molecule of the 
target simultaneously without having any IF3D r.ain- 
chain atom come closer than van der Waals distance 
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( V [ z . to 5.0 A) fron any target atom. For the 

purposes of the present invention, a residue of the 
IPDD "touches" the target if: a) a main-chain atom is 
within von tier Waals distance, viz. 4.0 to 5.0 A of any 
atom of the target molecule, or b) the C beta is within 
°cutoff of an > r aCOP o£ thG tar( 3° c molecule so that a 
side-group acorn could mafcc contact with that- atom. 
Because side groups differ in (cL Table 35), some 

judgment is required in picking D cuto x *f. In the 
preferred embodiment, we will use D cuCO ff = 8.0 A, but 
other values in the range 6.0 A to 10.0 A could be 
used. If IPBD has C at a residue, we construct a 
pseudc C bcca with the correct bond distance and angles 
and judge,, the ability of the residue to touch the 
target from this pseudo C bcCa . 

Alternatively, wc choose a set of residues on the 
surface of the IP9D such that the curvature of the 
surface define;! by- trie residues in the set is not r.o 
great that it would prevent contact between all 
residues in the set and a molecule of the target. This 
method is appropriate if the target is a macrono lccule , 
uuch as. a protein, because the PBCs derived from the 
IPB0 will contact only a pari of the macromclecular 
surface. The surfaces cf nacronolecu les are irregular 
with varying curvatures. If wc pick residues that 
define a surface that is not too convex, then there 
will be a region on a mncromolecule r target vith a 
compatible curvature. 

In addition co the geometrical criteria, we prefer 
that there be some indication that the underlying 1PR0 
structure will tolerate substitutions at each residue 
in the principal set of residues. Indications could 
come from various sources, including: a) homologous 
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sequences, b) static co.-.puter modeling, or c) rtyr.araic 
computer simulations. 
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The residues in the principal set need not be 
contiguous in the protein sequence. The exposed 
surfaces of the residues to be varied do not need to be 
connected. We require cnLy that the amino acids in the 
residues to be varied all be capable of touching a 
molecule of the target zaterial simultaneously without 
having atoms overlap. If the target were, for example, 
horse heart myoglobin, and if the IPBD were 3PTI, any 
set of residues in one interaction set of BPTI defined 
in Table 34 could be picked. 

Preferably, the principal set contains eight tc 
sixteen residues. This number of residues allows 
sufficient variability that a- surface that is 
complementary to the target can be found, but is small 
enough that a signi r icinz fraction of the surface can 
be varied at one time. 



Sec. 13.1.2 : 



~v set 



The secondary sot ccr.prises those residues not in 
25 'the primary set that tc-=h residues in the primary set. 
These residues might bo excluded from the primary set 
because: a) the residue is internal, b) the residue is 
highly conserved, or c) tr.e residue is on the surface, 
but the curvature of the IPDD surface prevents the 
residue from being in contact with the target at the 
same time as one or mere residues in the primary set. 



Internal residues are frequently conserved and the 
amino acid type con not be chanqcd to a significantly 
35 different type without substantial risk that the 
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protein structure will be disrupted. Nevertheless, 
some conservative changes of internal residues, such as 
I to L or F to Y, are tolerated. Such conccrva t ive 
changes affect the detail placement and dynazii.es of 
adjacent protein residues and such variation nay be 
useful once an S30 is found. 

i. 

Surface residues in the secondary set arc -ost 
often located on the periphery of the principal sec. 
Such peripheral residues can not make direct . co:,:act 
with the target simultaneously with all the ether 
residues of the principal set. The charge on the anino 
acid in one of these residues could, however, have a 
strong effect on binding. Cnce an SBD is found, it is 
appropriate to vary the charge of sone or all of these 
residues. For example, the variegated codon containing 
equimolar A and C at base 1, ecruimolar C and A at base 
2, and A at base 3 yields anino acids T, A , K, and E 
with equal probability. 
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Choice of residues to vary initial ! • 
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Choice of residues in the primary and secondary 
set is based cn: a) geometry of the IPBD and the 
-geometrical relationship between the IPBD and the 
target (or a stand-in for the target) in a hypothetical 
complex, and b) sequences of proteins homologous to the 
IPBD. In this section we pick a subset of the residues 
in the primary and secondary sets, based on g c-c retry 
and on the maximum allowed level of variegation that 
assures progressivity . The allowed lovol of 

variegation determines how nciny residues can be varied 
at once; geometry determines which ones. 




The user may pick residues to vary in many ways; 
the following is a preferred manner. Pairs of residues 
are picked that are diametrically opposed across the 
face of the principal set. Two such pairs are used to 

5 delimit the surface, up/down and right/left. 
Alternatively, three residues that form, an inscribed 
triangle, having as large an area as possible, on the 
surface are picked. One to three other residues are 
picked in a checkerboard fashion across the interaction- 

0 surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because 
all the intervening residues must have acceptable 
complementarity before favorable interactions can occur 
at widely-separated residues. 

5 

The number of residues picked is coupled to the 
range through which each can be varied by the 
restrictions discussed in Sec. 13.2. In tne first 
round, we do net assume any binding between IPBD and 

0 the target and sc p rogress iv ivy is not an issue. At 
the first round, the user may elect to produce a level 
of variegaticn such that each molecule of vgDt'A is 
potentially different through, for example, unlimiced 
variegation of 10 codons (20 10 appro*. « 10 13 ) . One 

5 jun of the ON' A synthesizer produces approximately 10 l3 
molecules of length 100 nts. Inefficiencies in 
ligation and transformation will reduce the number of 
proteins actually tested to between 10 7 and 5 x 10°. 
Multiple replications of the process with such very 

0 high levels of variegation will not yield rcpeatable 
results; the u*jer rust decide whether this is 
important . 



Sec. 1 3 . 2 

Mutation: 



R-inry; of variation at Each Site of 
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Having picked which residues to vary, we must now 
decide the range of amino acids to allow at each 
variable residue. The total level cf variegation is 
the product of the number of variants at each varied 
residue. Each varied residue can have a different 
schene of variegation, producing 2 to 20 different 
possibilities. We require that the process be 
progressive, i.e. each variegution cycle produces a 
better starting point for the next variegation cycle 
than the previous cycle produced. 

K . 3 . : Setting the level of variolation such 
that the pobd and many sequences related to 
the cobd sequence are present in detectable 
amounts insures that the process is 
progressive. If the level of variegation is 
so hiqh that the pobd sequence is present at 
such low levels that there is an appreciable 
chance that no trans foment win display the 
then the best SBD of the r.e>:t round 
cou Id be worse than the P?2D. At excessively 
high level of variegation, each round of 
cutagenesis is independent of previous rounds 
and there is r.o assurance of progress ivity . 
This approach can lead to valuable binding 
proteins, but repetition of exper i.-ents with 
this level of variegation will not yield 
progressive results. Excessive variation is 
net preferred. 

Hypothetical exar.ple 2 considers the effects of - 
the level of variegation on the progress ivity of the 
process of the present invention. Figure 6 is a 
schematic view of a hypothetical eight-residue binding 
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surface of a PDD comprising residues 11. 24, 25, 30, | ■ _^ 

34. 42, 44, and 47 of a hypothetical protein. Each p^^T 
polygon represents the exposed portion of one residue. 
By hypothesis, there exists at least one protein, shown 
in Figure 6e. having a specific aoino acid in each or 

the eight residues that will bind to the target, but we pjg 
do not, at first, know what that sequence is. 

The IPBD. shown in Figure 6a, nay have none of the 
optimal asino acids on its surface. Because wa begin p|| 
with no information, our initial estimate is that all 
amino acids have equal likelihood of being the best at 
each of the eight residues. 



t... * - 

Dy hypothesis, the genetic engineering system of i:'v^ 
hypothetical example 2 has H nCv - 10 7 and the 

selection-through-binding system has C sensi = 10 7 . piC^ 
Also by hypothesis, the variegation method can produce 

all anir.o acids at a given residue with equal ^£$4 

2 0 probability. r-"^'-.'/ 

In the first variegation, ve vary residues 11. 24, 
25, 34 , and 44 through all twenty aaino acids, 
producing 20 5 = 3.2 x 10 G sequences. The capabilities 
of the gp.netic engineering system allows all these 
sequences to be present in the selection step and the ^ 
selection system can detect 1 C? in 10 7 . By 
Hypothesis, we isolate a CP carrying an sbd gene that 

encodes the first SBD, shown in Figure 6b. that has t y v 

improved binding for the target and has the amino acid 

sequence wi 1-F 24 -E2S-C30-034-E42-P44-T47 . This amino !'. '.^ 



acid sequence becomes the parental sequence to the next £ 
variegation. After the first variegation and 
■ selection, the evidence favors wii, F24, E25. D34, and 
35 P44 as optimal amino acids at their respective 




residues. That residues 30, 42, and 47 were not varied 
has two implications: 



1) uc still have no infornation about which amino 
5 acid is optimal at these residues, and 

2) the 'amino acids selected at the varied residues 
are optimal, qiven the identities of the amino 
acids in the non-varied residues; when residues 

10 30, 4 2 , and 47 are varied, our estimate of the 

optimal amino acids in other residues may change. 

Now consider two versions of a variegation that 
take the first intermediate SOD as parent and that 
15 night get us closer to the optical SBD. 

In the first version of the second variegation, we 
vary only five residues, producing 3.2 x 10° sequences, 
all of which are expressed and subjected to selecticn- 

2 0 through-binding. We vary residues 30, 42, and 4 7 
because thoy were not varied previously. We also vary 
two other residues so that as many surfaces as possible 
are tested; residues 24 and 44 are chosen. Suppose 
that we "isolate a CP that carries an sbd gene encoding 

25 the a.mino acid' sequence W11-L24-E25-I 30-D3-; -R42-P44- 
*K47, shown in Figure 6c. Consider the reason that 0 is 
retained at residue 34. We know that all the sequences 
W11-L24-E25-I30-X34 -R42-P44-K47 (where x runs through 
all twenty anino acids) were tested and therefore can 

30 conclude with improved confidence that 034 is optir.al, 
given the rest of the selected sequence. Now consider 
the change at residue 2 4 from F co L. We know that all 
the sequences Wl i-x24-E25-I3G-034-R42-P«;4-K47 were 
tested and we can conclude that L24 is optimal, given 

35 the rest of the sequence. At each of the varied 
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residues, . we gain information about which amino acids 
are optimal at each varied residue under the conditions 
imposed. 

5 In the second version, we will vary residues 11, 

24 30, 34, 42, and 47, each through all twenty amino 
acids, producing 20 6 = 6.4 x 10 7 possible different 
sequences. Our hypothesis is that only 1.0 x 10 7 of 
these sequences are produced and subjected to 
10 selection. Because only 15. 6\ of the programmed 
sequences are actually subjected to selection, it is 
likely that the parental sequence, Wl 1-F2 4 -E2 5-G30-D34 - 
E42-P44-T47, is not present in the selection step and 
there is, consequently, no assurance that the best S30 
15 binds more tightly to target than did the parental PSD. 
Suppose that we isolate a CP that carries an sbd gene 
.encoding the amino acid sequence VI 1-R2 4 -£Z 5-Q30-D34 - 
R42-P44-D47, shown in Figure 6d. Consider the reason 
that D is retained at residue 34. Is it that D is 
20 optimal, or is it that, by chance, the sequence 
encoding the optimal amino acid, x, was not present as 
V11-R24-E25-Q30-X34-R42-P44-D47 in the sample? We do 
not know and therefore can not conclude that D34 is 
optimal. Furthermore, retaining an amino acid can nor 
move us toward the optimal sequence. Now consider the 
change at residue 24 from F to R. Was VI 1-R24 -E25-Q30- 
034 -R4 2-F4 4-D4 7 selected because R24 is optir.il in the 
presence of Vll- -E25-Q30-034 -R4 2-P4 4 -D47 , or was Vll- 
R24-E25-Q30-034-R4 2-P44-D47 selected because V11-F24- 
E25-Q30-D34-R42- P44 -047 was not present to be selected? 
Again, we do not know and can not conclude that R24 is 
an improvement, i.e. we can not conclude that R24 is 
more likely to be optinal than is K24. In both cases, 
wc lose information about which amino acids belong at 
35 each residue. Wc may have obtained an SBD with 
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superior binding to the target. Another varieqation 
cycle at this level of variegation, however, nay 
produce a better protein or a vorse protein and the 
process is not progressive. 

Let us contrast versions 1 and 2 of the socond 
variegation. In version 1, ve retained, nore 

information, viz. that wil allows improved binding, and 
therefore cur selection of K<7 incorporates the 
information obtained in the previous rounds. In 
version 2 of the second variegation, we discarded the 
information that wil allows stronger binding than Yll. 

Prcgressivity is not an a 1 1-or-no tU ing property. 
So long as nost of the information obtained from 
previous variegation cycles is retained and many 
different surfaces that are related to the PPBD surface 
are produced, the process is progressive. If the level 
of variegation is so high that the ppbd gene r.ay not be 
detected, the assurance of progress iv i ty dioinishes. 
If the probability of recovering PPBD is negligible, 
then the probability of progressive bchavicr is also 
negligible. 

An opposing force in our design considerations is 
that PBCis are useful in the population only up to the 
ar.ount that can be detected; any excess above the 
detectable ar.ount is wasted. Thus we produce as many 
surfaces related to PPBD as possible within the 
constraint that the PP3D bo detectable. 

we defer specification of exactly how much 
variegation is allowed until we have: a) specified real 
nt distributions for a variegated coden, and b) 
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examined the effects of discrepancies between spec i tied 
nc distributions and actual nt distributions. 



Sec. 13.3: 



Desion of vnPtfA Encoding POP Family: 



We nust now decide ho- to' distribute the 
variegation within the codons for the residues to be 
varied. These decisions are influenced by the nature 
of the qer.etic code. When vgDNA is -jynthes i zed , 
variation at the first base of a codon creates a 
population containing amino acids from the sane column 
of the genetic code table (as shown in the Table 3-6 on 
p87 of WATSS7) ; variation at the second base of the 
codon creates a population containing amino acids fron 
the same row of the genetic code table; variation at 
the third base of the codon creates a population 
containing amino acids from the same box. If two or 
three bases in the sarve codon are varied, the pattern 
is more cor.pl ica ted . ' Work with 3D protein structural 
models may suggest definite sets of amino acids to 
substitute at a given residue, but the method of 
variation, may require cither more or fewer kinds of 
amino acids be included. For exa-.ple, examination of a 
model .might suggest substitution of W or Q at a given 
residue. Combinatorial variation of cocens requires 
"that mixing tl and Q at one location also include K and 
H as possibilities at the cane residue. One must 
choose to put: 1) N only, 2) Q only, or 3) a mixture c: 
It, K, H, and Q. The present inventicn does not rely on 
accurate predictions of which amino acids should be 
placed at each residue, rather attention is focused cn 
which residues should be varied. 

There are many ways to generate diversity in a 
protein. (See RECIIS6 , CARU3 5, and CLIPS 6.) One extrcrr.0 
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case is th.it one or a few residues of the protein are, 
varied as much as possible f inter alia see CAKU65, 
CARU87, RICU36, and WHAP.S6). We will call this limit 
"Focused Mutagenesis". Focused Mutagenesis is 

appropriate when the IPDD or other PPBD shows little or 
no' binding to the target, as at the beginning of the 
search Cor a protein to bind to a new target r.aterial. 
When there is no binding between the PPQD and the 
target, we preferably pick a set of five to seven 
residues on the surface and vary each through all 20 
possibi 1 it ies . 

An alternative plan of mutagenesis ("Diffuse 
Mutagenesis") that may be useful is to vary many more 
residues thrcugli a more limited set of choices (See 
vershon et aj^, C1U5 of INOU36 and PAKU86). This can 
be accomplished by spiking each of the pure nts 
activated for D::a synthesis f e . n . nt-phosphoramid i tes ) 
with a small amount of one or cere of the other 
activated nts. Contrary to general practice, the 
present invention sets the level of spiking so that 
only a small percentage ( 1% to .OOGOn, for example ) 
of the final product will contain the initial DMA 
sequence. This will insure that many single, double, 
triple, and higher nutations occur, but that recovery 
of the basic sequence will be a possible outcome. Let 
H b be the number of bases to be varied, and let Q be 
the' fraction of all sequences that should have the 
parental sequence, then M, the fraction of the mixture 
that is the majority component, is 

M = exp( log e (Q)/t! b \ = 10 < lo <? 10 / : *'b> . 

If, fcr example, thirty base pairs on the DMA 
chain were to be varied and It of the product is to 
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have the parental sequence, then each mixed nt 
substrate should contain 86t of the parental nt and 14% 
of other nts. Table 8 shows the fraction (fn) of DNA 
molecules having n non-parental bases when 30 bases are 
5 synthesized with reagents that contain fraction H of 
the majority component. When M=. 63096, f24 and higher 
are less- than 10" 3 . The entry "most" in Table 8 is the 
number of changes that has the highest probability. 
Note that substantial probability for multiple 

10 substitutions only occurs if the fraction of parental 
sequence (f0) is allowed to drop to around 10* 6 . 
Mutagenesis of this sort can be applied to any part of 
the protein at any time, but is most appropriate when 
some binding to the target has been established. The 

15 tr b base pairs of the DNA chain that are synthesized 
with mixed reagents need not be contiguous. They are 
picked -so that between N b /3 and N b codons are affected 
to various degrees. The residues picked for mutation 
are piefcod wi-.li reference to the 30 structure of the 

20 IF30, if knovn. For example, one miqht pick all or 
.lost of the residues in the principal and secondary 
set. we nay impose restrictions on the extent of 
variation at each of these residues based on homologous 
sequences or other data. The mixture of non-parental 

25 nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific 
probabilities of appearance at each endon. For 
example, one residue nay contain a hydrophobic amino 
acid in all known homologous sequences; in such a case, 

30 the first and third base of that codon would be varied, 
but the second would be set to T. Other examples of 
how this might be done will be given in the Detailed 
Example. This diffuse structure-directed mutagenesis 
will rovca 1 the subtle changes possible in protein 

35 backbone associated with conscrvat ive interior changes, 
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sucn as V to I, as well as sorr.e r.o z so subtle changes 

that require conconic^nt changes at two or rr.ore 
residues of the protein. 

For Focused Mutagenesis, ve now consider the 
distribution of nts that will be -inserted at each 
variegated codon. Each codon could be programmed 
differently. If we have no information indicating that 
a particular ar.ino acid or class of 'a-ir.o acid is 
appropriate, ve strive to substitute all *r.ino acids 
with equal probability because representation of one 
pbd above the detectable level is wasteful. Equal 
amounts of all four nts at each position in a codon 
yields the anino acid distribution: 



4/64 a 

2/6 4 H 
4/6 4 P 
1/6 4 W 



2/6-; C 
3/64 I 
2/64 Q 
2/64 Y 



2/64 D 
2/64 K 
6/64 R 
3/64 Stop 



2/64 E 
6/64 L 
6/64 S 



2/64 F 
1/64 H 
4/04 T 



4/64 C 
2/64 M 
4/64 V 



This distribution has the disadvantage of giving two 
basic residues for every acidic residue. 1:\ addition, 
six tir.es as nuch R, S, and L as w or M 'cccur. If rive 
codons - are synthesized with this d i r; r. r i bu t ion , 
sequences encoding five Rs are ?776-tincs r.vre abundant 
,than sequences encoding five Ws. To h.wc v-w-w-w-w 
present at detectable levels, ve r.ust have R-R-R-R-R 
present in 7776-fold excess. 

Consider. the distribution of anir.o acids encoded 
by or.e coden in a population of vgC.'.'A. Let Abun(x) bo 
the abundance of DNA sequences coding for a-ir.o acid x; 
Afcunfx) is uniquely defined by the distribution of nts 
at each base of the codon. For any distribution, there 
will be. a r.ost- favored anino acid (r.faa) with abundance 
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Abun(mfaa) and a 1 cast- favored ar-ino acid (lfaa) with 
abundance Abun(lfaa). We seek the nt distribution that 
allows ail twenty amino acids and that yields the 
largest ratio Abun ( 1 f aa ) /Abun (si faa ) subject to two 
constraints. First, the abundances of acidic and basic 
amino acids should be equal lest ve bias the PBDs 
toward a particular charge. Second, the number of stop 
codons should be kept as low as possible. \ Thus only nt 
distributions that yield Abun(E) *Abun(D) 
Abun (R) »-Abun { K) are considered, and the function 
maximized is: 

{ ( 1 -Abun (stop) ) (Abun(l f aa ) /Abun ( mf aa ) J J . 

We have simplified the search for an optimal nt 
distribution by limiting the third base to T cr C; C or 
C at the third base would be equivalent. All amino 
acids are possible and the nussac cf accessible stop 
codons is reduced because TCA and TAA codons are 
eliminated. The ar.ino acids F , Y, C, H, M , I , and D 
require T nt the third base while W, M, Q, K. and E 
require C. Thus we use an equir.olar mixture of T and C 
at the third base. 

A computer program, written as part of the present 
invention and named "Find Gpti.-um vgCodon" (3ee Table 
9), varies the composition at bases 1 and 2, in steps 
of 0.05, and reports the composition that gives the 
largest value of the quantity ( (Abun ( 1 f aa ) /Abun (nfaa ) 
( l-Abun(stop) ) ) ) . A vg codon is symbolically defined 
by the nt distribution at each base: 



base ii = 
base i^ = 



tl 
t2 



cl 
c2 



al 
a2 
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base *3 » 



t3 



c3 



a3 



o 



93 



1 1 + c 1 +■ a 1 + g 1 = 1 . 0 
t2 + c2 + a2 + g2 « 1.0 
t3 = g3 - 0.5, c3 = a 3 



The variation of the quantities tl, cl, al, gl, t2, c2 , 
a2, and g2 is subject to the constraint that 
10 Abun(E)+Abun(D) equals Abun ( K) +Abur, ( R } ; 

Abun(E) -t-Abun(D) » gl*a2 

Abun(K) +Abun(R) = al*a2/2 + cl*g2 + al*g2/2 



15 



20 



gl*a2 = al*a2/2 + cl*g2 + al*g2/2 
Solving for g2, we obtain 

g2 *» (gl*a2 - .0 . 5*al*a2 J / (cl «■ 0.5*al) 
In addition. 



tl 
t2* 



1 - al - cl 
1 - a2 - c2 



g2 



25 



We vary al, cl, gl, a2, and c2 and then calculate tl, 
g2, and t2. Initially, variation is in steps of "A. 
Once an approximately optinum distribution of nts is 
determined, the region is further explored with steps 
of 1*. The logic of this program is shown in Table 9. 
The optimum distribution is: 
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Cntirnr. vnCodon 



base 1 1 = 
base 22 - 
base *3 = 



0.26 
0.22 
0.5 



0 . 13 
0. 16 
0.0 



0.26 
0. 40 
0.0 



0. 30 
0.22 
0.5 



and yields DN'A r.olecules encoding each type anino acid 
vich the abundances shown in Table 10. 

The computer that controls a DirA synthesizer, such 
as the Milligen 7500, can be programmed to synthesize 
any base of an oligo-nt with any distribution of nts by 
taking some nt substrates (e.g. nt phosphoranidites) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be ni>:ed in ar.y ratics and placed in one 
of the extra reservoir for so called "dirty bottle" 
synthesis. Either of these methods ar.cunts to 

specifying the nt distribution. The actual nt 
distribution obtained will differ froa the specified r.t 
distribution due to several causes, including: 
differential inherent reactivity of nt substrates, and 
b) differential deterioration of reagents. It is 
possible to compensate partially for these effects, but 
sone residual error will occur. We dencte the average 
discrepancy between specified and observed nt fraction 
as S err , 

S err = square root ( averager ( f ODS - f S pec>/ f spec 1 ) 

were f cbs is the amount of one type of nt found at a 

base and f £pe c is tne ar -° ,jnC of chat c ''"P e of nC thdt 
was specified, at the sar.e base. The average is over 
all specific! types of nts and over a nur.bor (e^. 10 
cr 20) different variegated bases. By hypothesis, the 
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actual nc distribution at a variegated base will be 
within 5* of the specified distribution. Actual DNA 
synthesizers and 0U\ synthetic chemistry nay have 
different error levels. It is the user's 



5 responsibility to determine S e 



for the 



DMA 



synthesiter and chemistry e-ployed by the user. 



To determine the possible efrects of errors in nt 
composition on the amino-acic distribution, we modified 
the program "rind Optimum vgCcdon" in four ways: 

1) tht fraction of each nt in the first two bases 
is allowed to vary frcn its optimum value times (1 



r ) to the optir.u.-z value tin 



(1 



-) in 



equal steps (5 £ 



is the hypothetical 



fractional error level entered by the user) ; the 
sun of nt fractions at one base always equals 1.0, 

2) g2 is varied in the sane manner as a2, i.e. we 
dropped the restriction that Abun(U) "-Abun ( Z) = 
Abun(K) +Abun(R) , 



3) t3 and g3 are varied from 0.5 times (1 - i 
to 0.5 times (1 + S err ) in three equal steps, 



-) 



4) the snal le st r3tio Abun (1 faa)/Abun(mfaa) is 
sought. 

In actual experiments, ve will direct the synthesizer 
to produce the optinun £:.*A distribution "Optimum 
vgCodon" given above. Ir.ccr.plete control over D:*A 
chemistry may, however, cause us to actually obtain the 
following distribution that is the worst that can be 
obtained if all r.t fractions are within S\ of the 
amounts specified in "Optimum vgCodon". A 
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correspond ing tabic con be calculated for ar.y giver. » ^ 

S err using the program "Find worst vgCodcn within Serr p 

of given distribution." given in Table 11. fc"'*^ 

optimum vgCodon. wors t 5* orrcrs ' 

' MM 



base '1 - 



25 * When five cbdons are synthesized using eouir^iar 



at each varied base requires as sensitive a separat;on 
system as dees detecting the least-favored anino-acid 



k- 



0.231 0.139 0.273 0.287 }""■'/••§!- 



base ?2 "= 0. 209 0. 160 0. 400 0.231 fc'^M 
10 base S3 = 0.475 0.0 0.0 0.525 

This distribution yields DNA encoding differed" ^ 
amino acids at the abundances shown in Table 12. jJ-- ; 

15 if five codons arc synthesized with reaqents nixed kV--^ 

so as -to produce the nt-distr ifcut icn "Opti-urr. vgCodor.". hv;*::**:' 
and if we actually obtained the nt-distritut icn 
"Cptimun vcCodon, worst 5\ errors", then DNA sequences 

encoding the nfaa at all of the five codons are atcut --'v-a 
20 277 tin:es as likely as DNA sequences encoding the j-j 

at all of the five codens; about 24 i nf the SNA t. ; >."r 
sequences* will have a stop ccdon in one or .-ore of the 
five codons . 



m 



mixtures at bases 1 and 2, ( Afcun ( m f aa ) / Afcun ( I f sa) ) 5 = t 

7776: If we program the optir.um nt d is tr ibut icr. ani i 

come within 5t, then ( Acun (r.f aa) /Abun ( 1 f aa) ) 5 = 2" : 7. - 

The total number of different PGOs is unchanged, tut ^ 

the least-favored sequence is about 23 tires rcre I 

abundant. Detecting the least-favored anino-acid t\. 

sequence when varying four residues with equinolar r.ts i, 
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sequence when varying five residues with the optimized 
nt distribution. 

By hypothesis, the distribution "Optimal vgCodon" 
is used in the second version of the second variegation 
of hypothetical example 2. The abundance of the DNA 
encoding each type of amino acid is, however, taV-.Rr. 
from the Table 12. The abundance of DNA encoding the 
parental amino acid sequence is: 

Amount (parental seq.) 

F24 C30 034 E42 T47 

= Abun(F) * Abun(C) * Abun(D) * Abun(E) * Abun (TJ 

,0249 x .0663 x .0545 X .0602 x .0437 
= 2.4 x 10" 7 



Therefore, DNA encoding the PPDL) sequence as well as 
very many related sequences will be present in 
sufficient quantity to be detected and we are assured 
2G that the process will be progressive. 

We use the following procedure to determine 
whether a given level of varicqar.ion is practical: 
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30 



1) from: a) the intended nt-d is t r ibu t ion at each 
base of a variegated coden, and b) S err (the error 
level in nixed on A synthesis), calculate the 
abundances of DN'A sequences' coding for e^ch amine 
acid and stop, 

2) calculate the abundance of O'.'A encoding the 
PPBD sequence by multiplying the abundances of the 
parental amino acid at each variegated residue. 



m 




1<9 

The abundances used in Che procedure above are 
calculated, from the worst distribution that is within 
S err of the specified distribution. A variegation that 
ensures that the PPDD sequence can be recovered is 
practical. PPBD can be recovered if the abundance of 
PPBO-encoding OtfA is larger than both 1/M n tv and 
1/C sens ; .' Preferably, the abundance of PPBD-encod ing 
DNA is 3 to 10 tirr.es higher than both 1/M ntv and 
1 / c sensi -° provide a margin of redundancy. M n tv is 
the nur-ber of trans fomants that can be made from *D100 
DNA. With current technology Mntv is approximately 5 x 
10 s , but the exact value depends on the details of the 
procedures adapted by the user. Improvements in 
technology that allow core efficient: a) synthesis of 
D.VA, b) ligation of DI.'A, or c) t rans foraa t ion of cells 
will raise the value of M ntv . c sensi *- s th - 
sensitivity of the affinity separation; i-provements in 
affinity separation will raise C sens ^. If the scalier 
of M ntv and Csensi is increased, higher levels of 
variegacicn may be used. For example, if C sePS [ is 1 
in 10 9 and M ntv is 1q8 ' then improvements in C SGni; i are 
less valuable than ir.provcr.Gncs in M n::v . 

A level of variegation thit allows recovery of the 
-PPBD has two properties: 

1) ve can not regress because the PPBD is 
ava i lable , 

2) an enormous number of multiple changes related 
to the PPBD are available for selection and we are 
able to detect and benefit from these changes. 

rt is very unlikely that all of the variants will 
be worse than the PPBD; we require the presence ot PPBD 
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at detectable levels to insure that all the sequences 
present are indeed related to PPBD. 

The user must adjust the list of residues to be 
varied and levels of variolation at each residue until 
the calculated variegation is within the bounds set by 
M ntv and C sens £- 

Preferably, wo also consider the interactions 
between the sites of variegation and the surrounding 
DNA. If. the method of nutagenesis to be used is 
replacenenc of a cassette, we consider whether the 
variegation will generate gratuitous restriction sites 
and whether they seriously interfere with the intended 
introduction of diversity. We reduce or eliminate 
gratuitous restriction sites by appropriate choice of 
variegation pattern and silent alteration of codens 
neighboring- the sites of variegation. See the Detailed 
sie. 



Sec, 



14.1: 



Insertion of synthetic ycC^A yn:o — ± 



In the case of cassette mutagenesis, the 
restriction site's that were introduced when the gene 
for the inserted denain was synthesized are used to 
introduce the synthetic vgC.'IA into a plasr.id or ether 
OCV. Restriction digestions and ligations are 

performed by standard -ethods . ( AUSUS7 ) . 

In the case of s ing ie-stranded-o 1 igonuc leot ice- 
directed r.utagenesis, synthetic vgCt/A is used to create 
diversity in the vector (DOTS35). 



35 



Sec. 14.2: Transformation of cells 
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: v. 1 

The present invention is not limited to any one 
method of transforming colls with ONA. The following 
procedure is a modification of that of Mar.iatis (p250, 
5 KAN I S 2 ) . This procedure is only one example of how the 
necessary transformations nay be performed. The 
procedure products appro* i no te ly (V c /25) x 10 7 or core 
transformants. The user pi<=*s a value for 7 C , the 
initial vol use of the cell culture, to provide the 
10 desired number of trans fo truants . All water is triple 
distilled and is treated with activated charcoal cor 24 
hours . 

1) culture coli in v c ni of LB broth at 37°C until 
15 cell density reaches 5 x 10 7 to ? x 10 7 eel Is/ml , 

2) chill on ice tor 65 ninutes, centrifuge the ceil 
suspension at 4000g for 5 minutes at 4°C, 

2 0 3) discard supernata.- 1 ; resuspend the cells in V 0 /3 ni 
of an ice-cold, sterile solution of GO mM CaCl : . 

4) chill on ice fcr 15 nini::es ( and then centrifuge at 
•lOOOg for 5 minutes at 4°C, 

25 

-5) discard supernatant; rc-juspend cells in 2 x V c /25 
ml of ice-cold, sterile 60 rJ< CaCl,; store cells at 
4°C for from 10 minutes to 24 hours; trans format ion 
efficiency increases by about 4-fold in the first 24 
30 hours and then returns to the original value. 

6) add D.'JA ia ligation or TE buffer to V c /250 ml cf 
cells; mix and store cn ice for 30 minutes. 



o o 

7) heat shock ceLls at 42°C Cor an appropriate amount 
of time, 

S) add V c /25 ul LD broth and incubate at 37°c for 1 
5 hour, 

3) place "cells on LU agar containing antibiotic, 
10) harvest CPs in appropriate manner. 

It is not necessary to isolate transformed cells 
between transformation and affinity separation. We 
prefer to have transformed cells at high concentration 
so that thriy can be plated densely on relatively few 
plates. For this purpose, steps (9) and (10) nay be 
replaced with a procedure in which the cells in step 
(8) are further diluted with LB broth and the 
selecting antibiotic is added. In the case of 
acpicillin, lysis of sensitive cells occurs, and 
resistant ceils are enriched by centrifugati.cn at 2 to 
3 h after addition of antibiotic. 

One routinely obtains between 10' and 5 x 10 8 
transfomants/ug of CCC DNA. Ligation efficiency 
25 ranges form 0.1* for blunt-blunt insertions, to as 
nuch as 15t for sticky-sticky insertions. For large 
transf orr.at ions , it r.ay be desirable to purify 0:iA 
between ligation -\nd transformation because unligated 
DNA is thought to compete with CCC DNA for entry into 
30 the competent cells. Only a small fraction of cells 
are competent, typically ".11. The heat shock has 
been optimized for transformation reactions carried 
out in a volume of 200 ul in a plastic hlppendcrf tube.; 
" optimizing this step for larger volumes is possible. 
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This procedure requires 
transf onr.ants . 



up tc 2uq DMA per 



Sec. 14.3: Growf * CPfvnPBD) copulation : 
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-The transformed cells arc grown first under non- 
selective conditions that allow expression of plasnid 
genes and then selected to kill *ur.trans f orn-.ed cells. 
Transf orbed cells are then induced to express t.-.c or.p- 
£^d gene at the appropriate level of induction, as 
determined in Sec. 10,1. The CPs carrying the IPDD 
are harvested by a method appropriate to the package. 

A high level of diversity can be generated by in 
vitro variegated synthesis of cr.'A and this diversity 
can .be naintained passively through several 
generations in an organism without positive selective 
pressure. . Loss or reduction in frequency of 
deleterious nutations is advantageous for the purposes 
of the present invent icr.. As ve do' not know how one 
.•night press £. cojj. or anv other kind of coll to 
actively -aintain diversity, vc- specify that the vg:)MA 
-,u3t be used to prepare piasnids, that the piasmids 
are used to transform cells, and that the selection 
ziust be performed before no-re than a few generations 
elapse. Moreover, subdividing the variegated 
population before a.7?l i f ic*i t icn in an orncnisn by 
renoving a snail s-i"io (Iczz than 101) for further 
work would result in loss of diversity; therefore, one 
should use all or r.cst of tho synthetic ON A and most 
or all of the trans forr.ed cells. 
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The har.'tsccd packages arc r.ow enriched for the 
b ind ing-to-targct phenotypc by use of affinity 
separation involving cr *- c target material ic\-:ob i 1 ized 
on an affinity natrix. Packages that fail to bind to 
the target aaterial are washed away. If the packages 
are bacteriophage or ondospoies, it -ay be desirable 
to include a bacteriocidal agent, such- as az ide, in 
the buffer to prevent bacterial growth. The buffers 
ussd in cnro.id^cgrtipr.y r.ust include: a) any ions or 
other solutes needed to stabilize the target, And b) 
any iens or other solutes needed to stabilize the PDDs 
derived froc the IPED. 

Sec. 15.1: Attaching the t'nrn *t raterial to a colunn: 

Affinitv column chromatography is the preferred 
cethpd of affinity separation, but other affinity 
separation r.ethods r.ay be used. A variety of 
commercially available support materials f^r affinity 
chromatography are used. These include derivatized 
beads to which the target material is covalently 
linked, or r.on-dc r L va t i zed material to which the 
target material adheres irreversibly. 

Suppliers of support r.aterial for affinity 
chromatography include: Applied Protein Tecnnolcgios 
Cambridge, Bio-Pad Libera tor res , Rockville Center, 

NY; Pierce Cher.ical Cc-pany, Rock ford, IL. Target 
materials are attached to the r.atrix in accord with 
the directions of the -anufacturer of each natrix 
preparation with consideration of good presentation of 
the target. 



Sec. 15.2: 
binding: 



Peduc im election due to ngn-soec i fjc 
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We reduce non-specific binding of CP(PBD)e to the 
matrix that beers the target in two ways: 

5 1) we treat the column with blocking agents such 

as genetically defective CPs cr a solution of 
protein before the population of G?(vgP8D)s is 
chronatograph.ee! , and ^ 

10 2) we pass the population of C?(vg?BD)s over a 

matrix containing no target or a different target 
from the same class as the actual target prior to 
affinity chromatography. 

15 Step (1) above saturates any non-specific binding that 
the affinity matrix might show toward wild-type CPs or 
proteins in general; step (2} removes components of 
our population that exhibit non-specific binding to 
the matrix or to molecules of the sane class as the 
20 target. If the target were horse heart myoglobin, for 
example, a column supporting bovir.s serum albumin 
could be used to trap CPs exhibiting PBDs with strong 
non-specific binding to proteins. If cholesterol were 
the target, then a hydrophobic compound, such as p- 
25 ^ tert iarybutylbenzyl alcohol, could be used to remove 
CPs displaying PBDs having strong non-specific binding 
to hydrophobic compounds. It is anticipated that PBDs 
that fail to fold or that are prematurely terminated 
will be ncn-s?ecif ically sticky. These sequences 
30 could outnumber the FDDs having? desirable binding 
properties. Thus, the capacity of the initial column 
that removes indiscriminately adhesive PBDs should be 
greater f c .g. 5 fold greater) than the column that 
supports the target molecule. 
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Variation in the support material (polystyrene, 
glass, agarose, cellulose, etc. ) in analysis of clones 
carrying GUDs is used to eliminate enrichment for 
packages that bind to the support material rather than 
5 the' target. 
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To ' separate the G?(P3D)s that carry PBDs that 
show actual binding to the target from GP(PBD)s that 
carry PBDs that do not actually show binding to the 
target, the population of CPs is applied to an 
affinity matrix under conditions compatible with the 
intended use of the binding protein and the population 
is fractionated by passage of a gradient of some 
solute over the column. The process enriches for PBDs 
having affinity for the target and for which the 
affinity for the target is least affected by the 
eluants used. The enriched fractions are those 
containing viable CPs that elute from the column at 
greater concentration of the eluant. 

Any ions or cofactors needed for stability cf 
PBDs (derived fron IPBD) or target must be included in 
'initial and elution buffers at appropriate levels. We 
first remove C?(PBD)s that do not bind the target by 
washing the matrix with the volume of the initial 
buffer required to bring the optical density (at 26C 
nn or 280 nm) back to base line plus one void, volume 
(V v ) , but not more than 5 v v . The column is then 
eluted with a gradient of increasing: a) salt, b) [H+J 
(decreasing pH) , c) neutral solutes, d) temperature 
(increasing or decreasing), or e) some combination of 
these factors. The solutes in each of the first three 
gradients have been found generally to weaken non- 
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covAlent interactions between proteins and bound 
molecules. Sale is the most preferred solute for 
gradient fomation in nost cases. Other solutes that 
generally weaken non-covalent interaction between 
proteins and bound molecules nay also be used. "Salt" 
includes solutions containing any or all -of the 
following ionic species: 



10 


Nat 


K- 


Ca* + 


Mg — 




NH 4 + 


Li* 


Sr+t 


Ba + + 




Rb* 


CS + 


Cl- 


Br- 


15 












so 4 — 


KSO.- . 


po 4 — 


HP0 4 -- 




H 2 P0 4 - 


co 3 -- 


HC0 3 - 


Acctate 


20 


Citrate 


Star.dard 1- 


Standard 


Cuan id in iura 






A-ir.o Acids 


nucleotides 


CI 



Other ionic cr neutral solutes nay be used. All 
2 5 solutes are subject to the necessity that they not 
kill the genetic packages. Because bacteria continue 
to r.ecabolicc during affinity separation, the choice 
of buffer cor.poner.ts is nore restricted for bacteria 
than for bactar; cchace or spores. NeutraL solutes, 
30 such as ethar.ol, acetcne, ether, or urea, are 
'frequently used in protein purification and are known 
to weaken non-ccvalent interactions between * prote ins 
and other cclecules. Many of these species are, 
hcvevor, very har-ful tc bacteria and bacteriophage. 
35 Bacterial spores, cn the other hand, are inpervicus to 
cost neutral sclutes. Several passes nay be nade 
through the steps in Sec. 15. Different solutes r.ay 
be used in different analyses, salt in one, pH in the 
next, etc. 
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Recovery of packages that display binding to an 
affinity column may be achieved in several ways, 
including: 

1) collect fractions eluted from the column with 
a gradient as described above; fractions eluting 
later in the gradient contain CPs nore enriched 
for 'genes encoding PUOs with high affinity for 
the column, 

2) elute the column with the target material in 
soluble form, 

3) flood the ir.dtrix with a nutritive nedium and 
grow the desired packages in situ, 

4) remove parts of the matrix and use there to 
inoculate grovth medium, 

5) chemically or cnzymatically degrade the 
linkage holding the target to the matrix so that 
GPs still bound to target are eluted, or 

6) degrade the packages and recover C:.*A with 
phenol or other suitable r.olvent; the recovered 

■DUX is used to transform colls that regenerate 
GPs. 



It is possible to utilize combinations of these 
methods. It should be remembered that what ve want to 
recover from the affinity matrix is not the CPs r<*r 
se, but the information in them. Recovery of viable 
CPs is very strongly preferred, but recovery of 
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genetic material is essential. If cells, spores, or f- 

virions bind irreversibly to the * matrix but are not p. 

*• 

killed, we can recover the information through in sjLtu 

cell division, germination, or infection respectively. \\ 

5 Proteolytic degradation of the packages and recovery £ 

of DNA is not preferred. f/ : 

Although degradation of the bound .CPs and ? 

recovery of genetic material is a possibLe node of I 

10 operation, inadvertent inactivation of the CPs is very p,' 

deleterious. It is preferred that maximum limits for |^r. 
solutes that do not inactivate the CPs or denature the 
target or the column are determined. If the affinity 
matrices are expendable, one ir.ay use conditions that 

15 denature the column to elute CPs; before' the target is 

denatured, a portion of the affinity matrix should be pV- 

removed for possible use as an inoculum. As the CPs j-; 

are held together by protein-protein interactions and \{ 

other non-covalent molecular interactions, there will . j£; 

2C be cases in which the molecular package -ill bind so C.. 

tightly to the target molecules on the affinity matrix ^; 
that the CPs can not be washed off in viable forn. 

This will .only occur when very tight binding has been fcr- 

obtained. In these cases, methods (3) through (5) > 

2 5 above can be used to obtain the bound packages or the j*- 
genetic messages from the affinity matrix. 

I ; 

It is possible, by manipulation of the elution £" 

conditions, to isolate SDDs that bind to the target at f- 

30 one pH (pH b ) but not at another pU <pH 0 ) . The {- : 

population is applied at pH b and the column is washed y. 

thoroughly at pM b . The column is then elutcd with jy 
buffer at pH Q and CFs that cone off at the new pH are 

collected and cultured. Similar procedures may be ^ 

35 used for other solution parameters. such as K 
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temperature. For example, CP(vgP3D)s could be applied 
to a column supporting ' i nsu 1 in . After eluting with 
salt to remove GPs with little or no binding to 
insulin, we elute with salt and glucose to liberate 
GPs that display PSOs that bind insulin or glucose in 
a competitive manner. 

Sec. 15.5 : Anolifvinn Che Enriched Packages 

Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the 
case of phage, infection into a host so cultivated. 
If the GPs have been inactivated by the 
chromatography, the OCV carrying the osp - pbd gene must 
be recovered from the CP, and introduced into a new, 
viable host. 



Sec. 15.6: 
needed : 



_3ete mining whether further enrichment is 



The probability of isolating a G? with improved 
binding increases by C e ff with each separation cycle. 
Let N be the number of distinct amino-acid sequences 
produced by the variegation. Wg wont to perform K 
-separation cycles before attempting to isolate an SBD, 
where K is such that the probability of isolating a 
single SBD is 0.10 or higher. 

K - the smallest integer>= log 10 (0.10 N) /log l0 (C eff ) 

For example, if N were 1.0 x 10 7 and C cff 
6.31 x 10 2 , then loglOfl.O x 106 J / log 10 ( 6 . J 1 x 102) = 
6.0000/2.8000 « 2.1-:. Therefore we would attempt to 
isolate S2Ds after the third separation cycle. After 
only two separation cycles, the probability of finding 





0 



© 



I 




161 

an SBO is ( 6 . 3 1 x. 1 0 2 ) 2 / ( 1 . 0 x 10 7 ) = .04 and 
attempting to isolate SBDs might be profitable. 

Clonal isolates from the last fraction eluted in 
5 Sec. 15.3 containing any viable CPs , as well as clonal 
isolates obtained by culturing an inoculum, taken from 
the affinity ' matrix, are cultured in a growth step 
that is similar to "that described in Sec. 14.3. If K 
separation cycles have been completed, sample's from a 

10 number, e.g. 32, of those clcnaL isolates are tested 
for elution properties on the (target) column. If 
none of the isolated, genetically pure GPs show 
improved binding to target, or if K cycles have not 
yet been completed, then we pooL and culture, in . a 

15 manner similar to the manner net forth in SEc. 14.3, 
the CPs from the last few fractions eluted (see Sec. 
15.4) that contained viable CPs and fron the GPs 
obtained by culturing an incculum taken from the 
column ;:atrix. then repeat "the enrichment 

20 procedure described in Sec. 15*. This cyclic 
enrichment nay continue N C h r0 n Passes or ur.til an S3D 
is isolated. 

If one or more of the isolated CPs has improved 
25 retention on the (target) column, wc determine whether 
- the retention of the candidate aKDs is due to affinity 
for. the target material as follows. A second column 
is prepared using a different support matrix with the 
target material bound at the optimal density. The 
30 elution volumes, under the same elution conditions as 
•jsed previously (see Sec. 15.3), of candidate CP(SD0)s 
are compared to each other and to CP(PP3D of this 
round). If one or more candidate CP(SBD)s has a 
larger elution volume than CP(PPBD of this round), 
35 then we pick the GP(SOD) having the highest elution 
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volur.c and proceed to characterize tine population (see 
Sec. 15.7). If none of the candidate C?(55D)s has 
higher elution volune than CP(?P30 of this round), 
then we pool and culture, in a manner similar to the 
5 nanner used previously (Sec. 15.3), the CPs from the 
last few fractions that contained viable CPs and the 
CPs obtained by cuituring an inoc-Jlu.-n taken from the 
colunn rr.atri\*. We then repeat the enrichment 

procedure of Sec. 15. 

10 

If all of the SSDs show binding that is superior 
to PPBD of this round, we pool and culture the CPs 
from the last fraction that contains viable CPs and 
from the ir.oculua taken f ro- the colucn. This 
15 population is re-chrcrr.a tographed at least one pass to 
fractionate further the CPs based cn K d . 

If an R:.*A phage vere used as CP, the F.'JA would 
either be cu i cured ' v i th the assistance of a helper 
20 phage or be reverse transcribed ar.d the OMA amplified. 
The amplified CNA could than be sequencer or sufcc.-ic.-e 3 
into suitable plasnirfs. 

Sec. 15.7: Ch* r-.c to r i r. \ r.n the Peculation: 

25 

We characterize r.c.r.bors of the population shewing 
desired binding properties oy genetic and biochemical 
nethods. We obtain clonal isolates and test these 
strains by genetic and affinity methods to determine 

30 genotype and phenotype with respect to binding to 
target. For several genetically pure isolates that 
show binding, we demonstrate that the binding is 
caused by the artificial chimeric gene by excising the 
os n-sbd gene, and crossing it into the parental CP. Wc 

35 also ligatc the deleted backbone of each CP from which 
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the osp-sbd is removed and demonstrate that each 
backbone alone cannot confer binding to the target on 
the CP. We sequence the oso-sbd gene from several 
clonal isolates. Primers for sequencing are chosen 
5 from the DNA Clanking the o sn-ncbd gene or from parts 

of the osn-DQ'od gene that arc not variegated. j! 

Sec. 15.8: Testing of bind inn affinity: h~. S 

10- For one or more clonal isolates, 'we subclone the ^"y^.-^ 

sbd gene fragment, without the osp fragment, into an 

expression vector such that each SBD can be produced tVv;-J:;.'. 
as a free protein. Because numerous unique E •*■-■•'??* 

restriction sites were built into the inserted domain, 

15 it is easy' to subclone the gene at any Lime. Each SBD 
protein is purified by normal means, including 
affinity chromatography. Physical measurements of the 
strength of binding are then made on each free SBD 
protein by one of the following methods: 1) alteration 

2 0 of the Stokes radius as a function of binding of the 
target material, measured by characteristics of 
elution from a molecular sizing column such as 
agarose, 2) retention of radiolabeled binding protein 
on a spun affinity colu:.n to which has been affixed 

25 ,the target material, or 3J retention of radiolabeled 
target material on a spun affinity column to which has 
been affixed the binding protein. The measurements of 
binding • for each free SBD ' are compared to the 
corresponding measurements of binding for the. P?SD. 

30 
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In each assay, we measure the extent of binding i 
as a function of concentration of each protein, and y 
other relevant physical and chemical parameters such f:. 
as salt concentration, temperature, pH, and prosthetic 
35 group concentrations (if any). 
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In addition, the SBD with highest affinity for 
the target from each round is compared to the best SBD 
of the previous round flPBO for the first round) and 
to the IPBD (second and later rounds) with respect to 
affinity for the target material. Successive rounds 
of mutagenesis and se lec t ion- ch rcugh-b ind ing yield 
increasing affinity until desired levels are achieved. 

If we find that the binding is not yet 

sufficient, we must decide which residues to vary next 

(see Sec. 16.0). If the binding is sufficient, then 

we now have a expression vector bearing a gene 
encoding the desired novel binding protein. 

Sec. 15.9: Other Affinity Sn^nt'o.i ?'.can_s_:. 



FACs .may be 
fluorescent labeled 
parameters determined 



to separate 
target with 
in Part II. 



CPs chat bind 
the cpt iniz^d 
We discriminate 



25 



against arti factual binding to the fluorescent able by 
using two or more differerrt dyes, chosen to be 
structurally different. CPs isolated using target 
labeled "with a first dye are cultured. These CPs are 
then tested with target labeled with a second dye. 



. Electrophorctic affinity separation uses 
unaltered target so that only other ions in the bufter 
can give rise to artifactual binding. Artifactual 

30 biding to the gel material gives rise to retardation 
independent of field direction and so is easily 
eliminated. A variegated population of CPs will have 
a variety of charges. The following 2D 

eleetrophorfttic procedure accommodates this variation 

35 in the population. 
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2 0 Soc . 16.Q : Tne N'pxt Vorioaation Cycle : 

We now consider which residues of the P30 should 
be varied in tho next variegat ion cycle. The general 
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First Che variegated population of CPs is [•":'.: 
eloctrophorescd in a gel that contains no target Srv '- 

material. The electrophoresis continues until the GP f/*r. ' 

5 s are distributed along the length of the lane. The £"*""'• 
gels described Ly Sewer for phage are very low in 
agarose and lack mechanical stability. The tarqet- 
free lane in which the initial electrophoresis is 
conducted is separate frcn a square of gel that 
10 contains target material by a removable baffle. After 

the first pass, the baffle is removed and a second P ^ 

electrcphorcsis is conducted at right angles to the 
first. CPs that do not bind target migrate with 

unaltered mobility while GP s that do bind target will E'->iv 
15 separate from the majority that do not bind target. A p\ 

diagonal line of non-binding CPs will fonn. This line E*"-"">- 
is excised and discarded. Other parts of the gel are 
dissolved and the CPs cultured. 
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rule is to preserve as much accumulated information as fc-^** 
25 ^ possible. If the leveL of variegation in the previous V 
variegation cycle was correctly chosen, then the amino \: ■ 



acids selected to be in the residues just varied are 
the ones best determined. The environment of ether 

residues has changed, so that it is appropriate to j .. '< 

30 vary them again. because there are always more ^ 

residues in the principal (Sec. 12.1.1} and secondary i 
sets (Sec. 13.1.2) than can be varied simultaneously, 

we start by picking residues that either have never P*-;.-. 

been varied (highest priority) or that have not been p *.■«"'.. 

3 5 varied for one or more cycles. If we find that ^ 
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varying all the residues except those varied in the 
previous cycle does not allow a hiqh enough level of 
diversity, then residues varied in the previous cycle 
might be varied again. For exacple, if H ntv (the 
nurrJoer of independent transfomants that can be 



produced f ron 



: 0K)0 



of D:iA) and 



( the 



sensitivity of the atfinity separation) were such that 
seven residues could be varied, and if the principal 
and secondary sets contained 13 residues, we would 
always vary seven residues, even though that ir.pl ies 
varying some residue twice in a row. In such cases, 
we would pick the residues just varied that contain 
the amino acids of highest abundance in the variegated 
codons used . 

It is the accumulation of information that allows 
the process to select those protein sequences that 
produce bind inq between the S2D and the target . Sone 
interfaces between proteins ana other molecules 
involve twenty or more residues. Complete variation 
of twenty residues would generate 10 2S different 
proteins. 'By dividing the residues that lie close 
together in space into overlapping groups of five to 
seven residues, we can vary a large surface but never 
- need to test rr.ore than 10 7 to 10 9 candidates at once, 
a savings of 10 l ° to 10 17 fold. The power of 
selection with accu-u 1 ;j t ion of information is well 
illustrated in Chapter 3 of DAKK86. 

Having picked the residues to vary, we again set 
the ranqe of variegation for each residue according to 
the principles set forth in 13.2, design the vgDtfA 
encoding the desired cutants (Sec. 13.3), clone the 
vgDtiA into CPs (Sec, 14), and sclect-by-b ind ing-to- 
target those CPs bearing SBDs (Sec. 15). 
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S*:c. 17,0: OTHER COffSIDF-PATICNo : 

Sec. 17.1: Joint selections: 

One may modify the affinity separation of the 
method described to select a molecule that binds to 
material A but not to material B. One needs to 
prepare' two selection colur.r.s, one with material A and 
the other with material 8. The population of genetic 
packages is prepared in the manner described, but 
before applying the population to A, one passes the 
population over the B column so as to remove those 
members of the population that have high affinity for 
B ("reverse affinity chromatography"). In the 

preceding specification, the initial column supported 
some other molecule simply to remove CP(?SD)s that 
displayed . PBDs having indiscriminate affinity for 
surfaces. 

It may be necessary to amplify the population 
that does not bind to B before passing it over A. 
Amplification would most likely bo needed if A and B 
were . in some ways similar and the PP30 has been 
selected for having affinity for A. The optinum order 
of interactions night be determined empirically. 

For example, to obtain an SOD that binds A but 
not B, three columns could be connected in series: a) 
a column supporting some compound, neither A nor B, or 
only the natrix material, b) a column supporting 3, 
and c) a column supporting A. A population of 
CP(vgPB0Js is applied to the series of columns and the 
columns are washed with the buffer of constant ionic 
strength that is used in the application. The columns 
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arc uncoupled, and the third column is eiuted with a 
gradient to isolate CP(P3D)s that bind A but not B. 

• Cno can ^lso generate molecules that bind to both 
A and 6 . In this case we can use a 3D model and 
mutate one face of the molecule in question to get 
binding to A. One can then mutate a different face 
to prepuce binding to B. When an SBD binds at least 
sor.ewn-it to both A and B, one can nutate the chain by 
Diffuse Mutagenesis to refine the binding and use a 
sequential joint selection for binding to both A' and 
B. 

The materials A and B could be proteins that 
differ at only one or a few residues. For example, A 
could be a natural protein for which the gene has been 
cloned and B could be a mutant of A that retains the 
overall 30 structure of A. SBDs selected to bind A 
but not 3 cust bind to A near the residues that are 
mutated in D. If the nutations -Jere picked to be in 
^he active site of A (assuming A has an active site), 
then an SBD that binds A but not B will bind to the 
active site of A and is likely to be an inhibitor of 
A. 

To obtain n protein that will hind to both A and 
B, we can, alternatively, first obtain in SBD that 
binds A and a different SBD that binds B. We can then 
combine the genes encoding these dona.'ns so that a 
two-domain s ingle-polypcpt ide protein is produced. 
The fusion protein will have affinity for both A and B 
becauce one of its domains binds A and the other binds 
B. 
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One can al-jo generate bindinq protoin:; with 
affinity for both A and D f such that these materials 
will compete tor the sane site on the bindinq protein. 
We guarantee competition by overlapping the sites for 
A and B . Usi.nq the procedures of the present 
invention, we first create a molecule that binds to 
target material A. We then vary a set of residues 
defined as: a) thnse residues that were varied to 



obtain bindinq to A. plus b) 



:cse residues . close in 



10 30 space to the residues of set (aj* but that are 
internal and so arc unlikely to bind directly to 
either A or D. Residues in set (b) are lively to r.ake 
small changes in the positioning of the residues in 
set (a) such that the affinities for A anc B win be 

15 changed by small anounts. Members of these 

populations are selected for affinity to both A ar.d 3. 



Sec. 17. 



Selection for non-bindin g: 



20 The method of the present invention can be used 

to select proteins tnat do r.ot bind to selected 
targets. Consider a protein of pha rmaco loq ica 1 
importance, such as streptokinase, that is antigenic 
to an undesirable extent. We can taHe ch^ 

25 pha rmaco loq ica 1 ly important protein as IrbO and 
antibodies against it nr. target. Residues cn the 
surface of the pharmacologically important protoin 
would be variegated and Cf»{FDi>)s that do not bind to 
an antibody column would be collected ar.d cultured. 

3 0 Surface residues nay be identified in several vayr. , 
including: a) from a 3D structure, b) from 
hyd rophob ic i ty considerations. or c) chemical 
labeling. The 3D structure of the pharmacologically 
important protein remains the preferred guide to 

3 5 picking residues to vary, eveept now we pick residues 



Q 



170 

that are widely spaced so that we leave as little as 
possible of the original surfjee unaltered. 



Destroying bind ing ■ frequent ly requires only that 
5 a single anino acid in the binding interface be 
changed. If polyclonal antibodies are used, we face 
the problem that all or most of the strong epitopes 
must be altered in a single molecule. Preferably, one 
would have a set of monoclonal antibodies,^ or a narrow 

10 range of antibody species. If we had a series of 
monoclonal antibody columns, we could obtain one or 
more mutations that abolish binding to each monoclonal 
antibody. We could then combine some or all of these 
mutations in one molecule to produce a 

15 pharmacologically important protein recognized by none 
of the monoclonal antibodies. Such mutants must be 
tested to verify that the pharmacologically 
interesting properties have not be altered to an 
unacceptable degree by the mutations. 

20 

Typically, polyclonal antibodies display a range 
of binding constants for antigen. Even if wo have 
only polyclonal antibodies that bind to the 
pharmacologically important protein, we may proceed as 

2 5 follows. We engineer the pharmacologically important 
protein to appear on the surface of a repl icable GP. 
-We introduce mutations into residues that are on the 
surface of the pharmacologically important protein or 
into residues thougNt to be on the surface of the 

30 pharmacologically important protein so that a 
population of CPs is obtained. Polyclonal antibodies 
are attached to a column and the population of CPs is 
applied to the column at low salt. The column is 
elutcd with a salt gradient. The CPs that elute at 

35 the lowest concentration of salt are those which bear 
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pharmacologically important proteins that have been 
mutated in a way that eliminates binding to the 
antibodies having naxinuo affinity for the 
pharmacologically important protein. The CPs eluting 
at the lowest salt are isolated and cultured. The 
isolated SBD becor.es the PPBD to further rounds of 
variegation so that the antigenic determinants are 
successively eliminated. «: 



Sec^ 



17 . 3 : 



Selection cf PBDs for retention of 



Let us take an SBD with known affinity for a 
target as PP5D to a variegation of a region of the ?B0 
that is far from the residues t t w .at were varied to 
create the SBD. We can use the target as an affinity 
cr.oltcule to select the PBDs that retain binding for 
the target, and that presumably retain the underlying 
structure of the IPBD. The variegations in this case 
could include insertions and deletions that are likely 
to disrupt the IPBD structure. We could also use the 
IPBD and AfM(I?3D) in the same --ay. 

For example, if IPBD -'ere 3?TI and AfM(2PTI) vcre 
trypsin, we could introduce fcur or five additional 
residue after residue 26 and select CPs that display 
PBDs having specific affinity for Aftt(BFTI). Residue 
26 is chosen because it is in a turn and because it is 
about 2 5 A frcm K15, a key amino acid in binding to 
trypsin . 

The underlying structure is most likely to be 
retained if insertions or deletions are made at loops 
or turns. 
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S*<=- 17.4: Created landing; proteins not unlgup: 

For each target, there are a large number of SBDs 
that may be found by the method of the present 
5 invention. The process relies on. a combination of 
protein structural considerations, probabilities, and 
targeted nutations with accumulation of in f oVma t ion . 
To increase the probability that some POD in the 
population will bind to the target, uc generate as 
10 large a population as we can conveniently subject to 
s^lection-through-binding in one experiment. Key 
questions in management of the method are "How many 
transfornants can we produce?", and "How small a 
component can we find through select ion-through- 
15 binding?". Ccneticists routinely t'ir.d mutations with 
frequencies of one in 10 10 using simple, powerful 
selections; wo experimentally determine the 
sensitivity of cur procedure. The optimum level of 
variegation is determined by the maximum number of 
20 transforr.ants and the selection sensitivity, so that 
f or any red;, enable sensitivity we may use a 
progressive process to obtain a series of proteins 
with higher and higher affinity for the chosen target 
material. enrichments of 1000-fold by a single pass 
25 of elution from an affinity plate have been 
demonstrated ( S M I T 3 S ) . Three rounds of such 
enrichment could produce 10 9 -fcld enrichment, and 
additional rounds may be added if necessary. 

30 Use of different variation schemes c^n y'.eid 

different binding proteins. For any given target, 
there is a larg**. plurality of proteins that will bind 
to it. Thus, if cr.c binding p-.otein turns out to be 
unsuitabL* for some reason ( e.g. too antigenic), the 

35 proceduro can be repeated witn different variation 



parameters. For example, one might choose different 
residues to vary or pick a different nt distribution 
at variegated ccdons so that a new distribution of 
amino acids is tested at the sane residues. Even if 
5 the same principal set of residues is used, one might 
obtain a different S BO if the order in which one picks 
subsets to be varied is altered. 

Sec. 17.5: Other ?odcs of mutagenesis possible: 

The modes of creating diversity in the population 
of GPs discussed herein are not the only modes 
possible. Any method of mutagenesis that preserves at 
least a large fraction of the information obtained 
from one selection and then introduces other mutations 
in the same domain will work. The limiting factors 
are the number of independent tra.nsformants that car. 
be * produced and the amount of enrichment one can 
achieve through affinity separation. Therefore the 
preferred embodiment uses a method of mutagenesis that 
focuses mutations into those residues that are most 
likely to affect the binding properties of the POO and 
are least likely to destroy the underlying structure 
of the IPBD. 

25 Other modes of mutagenesis might allow other GPs 

to be considered. For example, the bacteriophage 
lambda is not a useful cloning vehicle for cassette 
mutagenesis because of the plethora of restriction 
sites. One can, however, use • s ingle-stranded-o 1 igo- 

30 nt-directed mutagenesis on lambda without the need for 
unique restriction sites. Uo one has used -jingle- 
strandcd-ol igo-nt-d irocted mutagenesis to - introduce 
the high level cf diversity called for in the present 
invention, but if it is possible, such a method would 

35 allow use of phage with large genomes. 
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BPTT-Derived 3inclinq Protein for HHMb: Disolavc-d bv Mil 
Phage 

Presented below is a hypothetical example of a 
protocol for developing a new binding molecuie derived 
from BPTI with affinity for horse heart myoglobin 
(KHMb) usir.g the conncn col i bacteriophage M13 as 

genetic package. It will be understood that some 
further optimization, in accordance with the teachings 
herein, nay be necessary to obtain the desired results. 
Possible modifications in the preferred method are 
discussed immediately following various steps of the 
hypothetical example. 

By hypothesis, we set the following technical 
caoabilities: 



M DNA 
-Pi 



500 • ng/synr.hes is of ssDNA 100 bases 
long , 

10 ug/synthosis of ssC;.'A 60 bases long, 
1 ng/synthec is of ssDNA 2 0 bases leng. 

100 bases 

1 mg/1 

0.1 i for blunt-blunt, 

4 I for sticky-blunt, 

11 \ for sticky-sticky. 

5 x 10 3 
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C e rj 900-Cold enrichment 

c sensi 1 in 4 x 10 a 
N chrom 10 passes 
S err 0.05 

Ex^role I. Part I 

In this example, we will use M13 as a replicable 
GP and ePTI as IPSO. The considerations that lead to 
these choices are discussed. In Part I, we are 
concerned only with getting BPTI displayed on the outer 
surface of an M13 derivative. Variable DNA may be 
introduced in the o sj- ipbd gene, but not within the 
region that codes .for the tryps in-binding region of 
BPTI. Once DPTI is displayed on the M13 outer surface 
of an M13 derivative, we proceed to Part II to optimize 
the affinity separation procedures. 

We consider various CPs and, for this example, 
choose a filamentous bacter icpnage of col i , M13. We 
prefer phage over vegetative bacterial celis because 
phage are nuch less ne tabol ico I ly active. We prefer 
phage over spores because the molecular mechanises of 
th»* virion formation and 3D structure of the virion are 
much better understood than are the corresponding 
processes of spore torr.ation and structures of spores. 

H13 is a very well studied bacteriophage, widely 
used for D:«*A sequencing and as a genetic vector; it is 
a typical scr.be r of the class of filamentous phages. 
The relevant facts about Ml 3 and other phages that will 
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allow us to choose anonq phages are cited in Sec. 
1.3.1. 

Compared to other bacteriophage, filamentous phage 
5 in general are attractive and M13 in particular is 
especially attractive because: 

1) the 30 structure o: the virion is Known, 

10 2) nhe processing of the coat protein is veil 

vr.derstood, 

3) the gencr.e is expandable, 
15 i) the genome is snail, 

5) the sequence of the genor.e is known, 

6) the virion is physically resistant co shear, 

20 heat, cold, guanidinium Cl, low pK, and high salt, 

7) the phage is a sequencing vector so that 
sequencing is especially easy, and 

25 8) antibiotic-resistance genes have been cloned 

into the gcnor.e with predictable results (H IN £50 J . 

Other criteria listed in Sec. 1.0 and 1.3 of the are 
also satisfied: M13 is easily cultured and stored 

30 (FRIT35), each infected cell yielding 100 to 10C0 M13 
progeny after infection. M13 has no unusual or 
expensive media requirements and is easily harvested 
and concentrated (SALI64, FP.IT85) . M13 is stable 
toward physical agents: temperature (10* or" phage 

35 survive 30 ninutes at 85°C) , shear (Waring blender does 
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not HIM. desiccation (not applicable), radiation, (not 
applicable), ace (stable for years). 

MlJ is stable toward chemicals: P H (< 2.2 
5 (SMIT85,,. surface active agents: not applicable 
chactropes (guanidiniun ..CI - 6.0 M) . 1=- (no spec = 
sensitivities). organic solvents (ether and o he 
organi.: solvents are lethal (MARV78 ) ) , proteases (not 
applicable, HKKb not a protease). MlJ is not known to 
10 be sensitive to other enzyr.es. 

H1J genome is 6423 b.p. and the sequence is known 
(SCHA7 8) . Because the genome is small, cassette 
nutagancsis is practical on RF MlJ (AOSU87). as « 
singlo-strandod oligo-nt directed mutagenesis (FRIT35).. 
„13 -s a plasmid and transforation system in itseU. 
and an ideal sequencing vector. MlJ can be grown on 
Rec" strains of sail- The M13 ^"° ne is expandable 
(MESS78 , KRIT35) . MlJ confers no advantage. but 
doesn't lyse cells. The sequence of gene Jdlil »» 
known, and the anino acid sequence can be encoded on a 
' synthetic gene, using WW P"»°« r and U " J 

conjunction with the Lad", repressor. The lacL^ 
promoter is induced by IPTC. Gene VIII protein is 
25 secreted by a well studied process and is cleaved 
- between A2J and A24. Residues 18, 21. 22. «d 23 o. 
gen- VIII protein control cleavage. Mature gene VIII 
protein makes up the sheath arour.d the circular ssDMA. 
The 3D structure of fl virion is known at mediu., 
,0 resolution; the anino tcrr.inus of gene VIII protein ts 
cn surf.ee of the viricn. So fusions to M13 gene VIII 
protein have been reported. The 20 structure of MU 
• coat orotein is illicit i.n the 3D structure. Mature 
MlJ gene VIII protein has only one domain. There are 
35 four minor proteins: gene III. VI. VII. and IX. Each 



20 



of these minor proteins is present in about 5 copies (:• 

per virion and is reLated to morphogenesis or y 

infection. The major coat protein is present in more \ 

than 2500 copies per virion. J; 

5 I 

Although no fusions of Ml 3 gene VT II to ether [ 

" genes have been reported, knowledge of the virion 3D V 

structure makes attachment of IPOD to the amino r 

r 

terminus of mature M13 coot protein (M13 CP) quite £ 
10 attractive (See Sec. 1.3.2). Should direct fusion of 
BPTI to M13 CP fail to caucp BPTI to be displayed on 
the surface of M13, we will vary part of the 3PTI 

sequence and/or insert short r^ndon DMA sequences jf 

between BFTI and Mil CP (Sec. 1.3.4J. f 
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Smith (SKIT85) and de la Cruz ot al . (CJW22 3) have ^. 

shown that insertions into gene U_I cause novel protein ^ 

domains to appear or. the virion outer surface. If BPTI , j-N 

can not be nado to" aopear on the virion outer surface tf 

* t-_ 
by fusing the oot L gene to the r* \ 1 c :.■ gene, we will fuse 

hot i to ' gene II_I either at the site used by Smith and 

by de la Cruz et al . or to one of the termini. We will |. 

use a second, synthetic copy of gene III so that some 

unaltered gene III protein will be present. I 



* The gene VIII protein is chosen as OSP because it 

is present in many copies and because its location and 1* 
orientation in the virion are known. Mote that any 
uncertainty about the azimuth of the coat protein about 

30 its own alpha helical axis is unimportant; the amino £_ 

terminus ir- exposed for all jzinuths. p 

The 3D model of fl indicates strongly that fusing | 

BPTI to the amine terminus- of M13 CP is more lively to L" 

t 

3S yield a functional protein than any other fusion site. t 
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{See Sec. 1.3.2) . 

The anino-acid sequence of H13 pre-coat (SCHA78 ) , 
called AA — seql, is 

5 

AA_seql 

1 3 2 i : 2 3 3 4 '< 5 
5 0 5 0 <./5 0 5 0 5 0 

10 HKKSLVLKASVAVATLVFI'.LSr AAECODPAKAA FNS LQASATEY IGYAWA 

5 6 6 '7 7 

5 0 5 0 3 

MVWIVGATIGIKLFKKFT3KAS 

15 

The single-letter codes for aaino acids and the codes 
for ambiguous DNA are given in Table l. The best site 
tor . .insert ing a novel protein domain into M13 CP is 

20 after A23 because SP-I cleaves the precoat protein 
after A23, as indicated by the arrow. Pror.eins that 
can be secreted will appear connected to mature K13 CP 
at its amino tercinus. Because the amino terrinus of 
mature M13 CP is lecated on the outer surface cf the 

25 virion, the introduced dor.ain will be displayed cn the 
outside of the virion. 

3PTI is chosen as IPSO of this example (See Sec. 
2.1) because it meets or exceeds all the criteria: it 

30 is a small, very stable protein with a well known 30 
structure. Marks et -i\ . (MARK3 6) have shown that a 
fusion of the nhoA signal peptide gene fragment and UNA 
coding for the nature fom of RPTI caused native 5PTT 
to appear in the periplasm of c j1 i . demonstrating 

35 that there is nothing in the structure of BPTI to 
prevent its being secreted. 



Marks e£ a I . (KARK3 7) also showed that the 
structure of BPTl is stable even to the removal of one 
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of the cystine bridges. They did this by replacing l 

both C14 and C33 with cither two alanines or two K 

threonines. The CK/C33 cystine bridge that Marks « \ 

removed is the one very close to the scissile bond f 

5 in BPTI; surprisingly/ both mutant molecules I 

functioned as trypsin inhibitors. This indicates chat £ 

BPTI is redundantly stable and so is likely to fold & 

into approximately the same structure despite numerous |' 

surface mutations. Using the knowledge of homologues , f? 

0 vide inlra, ue can infer which residues must not be tffe 
varied if the basic BPTI structure is to be maintained. 

The 3D structure of BPTI has been determined at 
high resolution by x-ray diffraction (KU3E77 , MAP.033, 
> WLOD34, WLOD37a, WL0D37b), neutron diffraction 
(WLOD8 4 ) , and by (WAC.VS?). In one of the X-ray 

structures deposited in the Broofchaven Protein Outa jH 
Bank, "GFTI " , there was no electron • density for A53, 
indicating that A53 has no uniquely defined ^ 
conformation. Thus we know that the carboxy group does 
not make *ny essential interaction in the folded £ 
structure. The a.T.ino terminus of BPTI is very near to ti- 
the carboxy Ceminus. Goldenberg and Creighton 
reported on circularized BPTI and circular J y pernuted If 
'BPTI (GOLD83). Some proteins homologous to BPTI have w 
more or fewer residues at either terminus. |- 



BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of numerous 
experimental and theoretical studies ( STAT3 7 , SCHWs 1 1 
COLD3 3, CHAZ83). 



BPTI has the added advantage that at le^st 32 ^. 
homologous proteins are known, as' shown in Table 13. A to- 
tally of ionizable grcups is shown in Table u and the 



o 



o 



10 



15 



181 

composite of amino ac:.d types occurring at each residue 
is sho-n in Table 15. 

BPTI is freely soluble and is not known to bind 
metal ions. BPTI has no known enzymatic activity. 
BPTI binds to trypsin, K d = 6.0 x 10* 14 M (TSCH87) . 
BPTI is not toxic. If K15 of BPTI is changed to L, 
there is no measurable binding between the mutant Bm 
and trypsin (TSCH37). 

Stereo Figure 7 shovs the alpha carbons of BPTI 
plus the side groups of .conserved residues; all four 
atoms of conserved glycines are shown. All of the 
conserved residues are buried; of the seven fully 
conserved residues only G37 has noticeable exposure. 
The solvent accessibility of each residue in BPTI is 
given in Table 16 which was calculated from the entry 
"SPTI** in the Brookhaven Protein Data Bank with a 
solvent radius of 1.4 A, the atomic radii given in 
Table 7 , and the method of r,ee and Richards ( LEEB7 1 ) . 
Each of the 51 ncn-ccnservcd residues can accommodate 
two or more kinds of amino acids. By independently 
substituting at each residue only those amino acids 
already observed at that residue, we could obtain 
approximately 7 x 10 42 different amino acid sequences, 
most of which wil! fold into structures very similar to 
.BPTI. 



BPTI will bo useful as a IPBD for macrono lecu Les . 
30 (See Sec. 2.1.1) BPTI and BPTI homolcques bind tightly 
and with high specificity to a number of enzymes. 

BPTI is strongly positively charged except at ver,* 
high cH, thus Bf'TI is useful as I PDD for targets that 
35 are not also strongly positive under the conditions of 
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intended use (see Sec. 2.1.2). There exist homologues 
of BPTI, however, having quite different charges ( viz. 
SCI-III from- Bc-bvx nor i at -7 and che trypsin 
inhibitor from bovine colostrum at -1). Once a 
derivative of M13 is found that displays BPTI on its 
surface, the sequence of the BPTI domain can be 
replaced by one of the homologous sequences to produce 
acidic or neutral IPODs. 

BPTI is not an en;yDe (See Sec. 2.1.3). BPTI is 
quite small; if this should cause a pharmacologics 1 
problem, two or more BPTI-derived domains may be joined 
as in che human BPTI homologue that has f-ro domains. 

A derivative of M13" is the preferred OCV. (See 
Sec. 3). Wild-type M13 does net confer any resistances 
on infected cells; M13 is a pure parasite. A 
"phagenid" is a hybrid between a phage and a plasnid, 
and is used in this invention. Double-stranded plasmid 
DNA isolated from phagemid-bear ing cells is denoted by 
the standard convention, e . c. p\Y24. Phage prepared 
from these cells would be designated XY2-J. Phagemid:; 
such as Bluescript K/S (sold by Stratagene) are not 
suitable' :'or our purposes because Bluescript docs not 
contain the full genome of M13 and must be rescued by 
coinfection with competent wild-type M13. Such 
coinfections will likely lead to genetic recombination 
yielding heterogeneous phage unsuitable for the 
purposes of the present invention. 

It is also well known that plasmids containing the 
ColEl origin of replication can bo greatly amplified if 
protein synthesis is halted 'in log-phase culture. 
Protein synthesis can bo halted by addition of 
chlorarphcnicoi or other agents (KANI82) . 
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The bacteriophage MID bla 61 (ATCC 37039) is 
derived from wild-type M13 through the insertion of the 
beta lactamase gene (IUNES0) . This phage contains 3.13 
kb of o:.*A. M13 bla cat 1 (ATCC 37040) is derived fron j 5 
M13 bla Gl through the additional insertion of the £ "-. y 
chloramphenicol resistance qcne (HINEaO); K13 bla cat 1 
contains 9.S8 kb of Dt/A . Although ■ neither of these 
variants of M13 contains the ColE.1 origin of 
replication, either could be used js a starting point 
to construct a usable cloning vector for the present 
example. 

The OCV for the current exa-ple is constructed by 
a process illustrated in Figure 8. A brief description 
of all the plasmids and phagenids constructed for this 
Example is found in Table 17. 
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For ss oligo-nt s i te -d i r ec z ed mutagenesis. f*-V 
multiple primers lead to higher efficiency: Three r.on- pJ: 
mutagenic primers are used : ;jj 

r * 

I 

t 

5'(232o) GCC CCC TCT CAC CGT CCC CCT (22 52) j[ wtMi3 {./■;.;; 
3' ccg ccg aga etc cca cog cc: 5' oliq=24 , i 



"5' (4854) OCT CCT CCC TCT CAC CCC CCC (4875) 3; vtMi: 
' 3' eg. - *, cga ccg aga gtc gcg >:zri 5' olig = 25 , 



and 



5 '(34 51) CCC CTG ACC GTC CCT C7C CCC ( 34 3 1 ) 3' 

3' ggc cac teg eac cca gag cgc 5' olig?26 . 

Olig?24 is complementary to a segment • near the end of 
M13 gene IX! *ni1 olig = 25 is ccrplementary to part of 
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y.13 gene IV ( SCKA7 3 ) . OligS26 is part of the p£o R gene 
from p3R322 (MANI82, Appendix B) ,* the numbers shown 
refer to pBR.122 base pair numbers. Note that pLC2 and 
its derivatives carry the anti-sense strand of the ar^^ 
5 gene in the + DN A strand. The segments are picked to 
be high in CC content and to divide the pLG7. genome 
inro several segments of approximately equal length. 

The genetic engineering procedures needed to 

10 construct the OCV are standard. All restriction 
digests use commercial ly available enzymes and are 
carried out under conditions recommended by the 
supplier. All restriction fragments *f DSA are 

purified by HPLC or electrophoresis from agarose gels 

15 as described elsewhere in the present invention. 
Conpetent £^ col i are preferably prepared by a modified 
version of t.ie procedure of Maniatis (MANIS2) given in 
the generic detail section. M12 and its engineered 
derivatives are infected into £_s. col i strain PE3c4 

2C ( F * , Re-~,Sup + , Anp s j . Piasnid DN'A of M13 derivatives is 
transferred into col i strain ?Z 3 3 3 ( F" , Rec~ 

, Sup*,A:sp s ) so that we avoid multiple infections that 
might arise once phage are produced. Isolation of MI J 
phage is by the procedure of Salivar et al . (SALISs): 

25 isolation of r»plicative form ( RF) M13 is by the 
* procedure of Jazvinski et nl . {JA2W73a and JAZ~73b). 
Isolation of plasmids containing the CclEi origin of 
replication is by the method of Maniatis (MA::i3 2j . 

30 DUA sequencing is by the method of Sanger 

(AUSU37) . Virions of H13 derivatives contain circular 
ss DNA that is called the viral + strand. Base numbers 
are assigned from an agreed origin and in ascending 
order in the 5'-to-3' direction of the viral + strand. 

35 Conventionally, this ON A is drawn with the 5'-to-3' 
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direction clockwise and corresponding to increasing 
base number. In relation to the genomes of M13 
derivatives, we will use "up" or "above" to oean higher 
base nunber or further along in clockwise direction. 
Similarly "down" and "below" will mean lower base 
number or' further along in the counterclockwise 
direction. To determine the base sequence of part of 
an M13 derivative, one needs a sequencing primer that 
is complementary to a region above and within about 100 
bases of the region to be sequenced. Because the OCT* 
is constructed from parts of M13mpl8, parts of pBR322, 
and synthetic DNA, the sequence of flanking regions is 
always known. 

We pick the ar.n R gene from pBR322 as a convenient 
antibiotic resistance gene. Another resistar.ee gene, 
such as kananycin, could be used. (The New Engiand 
BioLabs 1983/39 catalogue contains a genetic nap of 
pBR322 on page 106.) The plasmid pBP.322 also contains 
the ColEl origin cf replication. The restriction sites 
Acc I at 2246 and Aa t II at <286 are the most 
convenient places to cut p3?.322 to obtain bcth an 
intact aro R gene and the CclEl origin of replication 
with ends suitable for ligation- to other ONA. 

The plasmid pQP.322 contains a unique A \~:t I site 
at base 2386 that is between the a^p R gene and cri . 
There is a unique A l vH I site in M13r*ipl3 at base 213". 
When the Acc I-to- A* t IT fragment of pBR322 is ligated 
into K13r.pl3, there will be two AlwH I sites and no 
easy way to excise the ann^ gene. Thus we convert the 
Al vK I site of pBR322 into an >.ba I site that will ce 
unique in all the D.'.'A constructs of the present 
example. The two oligo-nts: 
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5' ccgaTCTACActag tcgCCA 

3' CCTagctACATCTqa tcagc 



3 ' 

5 ' 



oligsoO 
olig=61 



are synthesized by standard methods and annealed. The 
Aju-N I site at base 2336 in pBR322 has the sequence 5'- 
CAGCCACTC - 3'. Plasnid pBR3 2 2 is cut with Alwff t ar.d 
nixed with the synthetic ds DNA and lighted. Cells ire 
transformed ar.d selected for tetracycline resistance. 
Tetracycline resistant colonies are screened for the 
correct insert by restriction digestion with '<^A I "nat 
cuts the correct . construction but net p3R32 2 . The 
correct construction is called pLC3 2 2 . Plasz id pLC222 
differs fron pDR322 enly by the replacement o: the 
A I wN I site with an Xbj • site. 

The plasniid p L-C 3 2 2 contains a second Ace I 
restriction site at base 651 so that digestion of 
pLG322 with A*t XI and Ace I yields • three fragments, 
cr.e of about 20*; 1 bases (Chat ve want), ch'c cf •'•.bout 
723 bases, and one c: about IcOO bases. To fac i L : t-ace 
isolation of the 2 04 1 -base fragment, we ai-jo digest 
?LG3 22 with 2_tv I that cuts at base 13 69 . The S_^/ I 
cut reduces the 1600-bace franr.er.t to two rracr-enus cf 
about 7 00 and about 9 00 bjr.es each. We purify the 
2 0 4 l - n t fragment by H ? L C or agarose gel 
'electrcphores is . 

M13npl8, sold by New England SioLabs, contains 
neither Aat II nor Acc I "sites. Therefore ve insert an 
adaptor that allows us to insert the Aat II-tc-Aoc I 
fragment of pLC3 2 2 that carries the a^e R gene and the 
CclEl origin of replication into a desirable place in 
M13npl3. M13r.pl3 contains a l-iclT/S proncter and a 1 * c Z 
gene that are not useful to the purposes of the present 
invention. By cutting r. 13r.pl 3 with Ay a 11 at the 
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unique site at 5914 and with Dsn 3 6 I at the unique site 
at 6 508 and discarding the approx inately 60 0 



t 

L . ' 

intervening base pairs, we elininate all recognition 

sites of the enzymes shown in Table 18 from M13mpl8. * 

M13mpl8 itself is not cut by the enzynes listed in ~ 
Table 19. Among the enzymes in Tables 18 and 19, those 
listed in Table 20 have recogniticn sites within the J\ 
Acc I-to- Aa t II fragment of pLG322 that contains the 
10 anp R gene and the ColEl origin of replication. 



Therefore the following adaptor is synthesized, 

5' CACCCACCTCtgcctcGTATACCCCACCCcatagctCC 3' olig«l 

15 3' CCTGCACacggagCATATCCCCTCGCg ta tcgaCCACT 5' oligs2 f • \ 

Avail I Aatll ! lAccIlP.srri I I Bsu36I (■' 

where the Ava II and Aat II sites share one GC base jww"/ 
20 - pair, and the Acc I and Rsr II sites share a different 

r 

CG pair. The two33-base oligc-nts are synthesized by 

a standard procedure described elsewhere in the present i 

invention; the oligo-nts are annealed to oach other. 

The b?ces shown in lower case are spacers. In a later f 

i*'-' 

2 5 step, we will cut this adaptor with boch Aat, II and r '■" 

Acc I; for both enzymes to cut efficiently, there must t; 

, be at least five bas between the sites. Similarly, r 
we will begin the construction of the obd gene by 

inserting OfJA at the P:;r II and BsulS I sites; rhus ?! : 

30 these sites are separated by seven bases to allow ^ 

simultaneous cuts. \* 

h '■' 

The annealed ndantor is lighted with RF MUrcpia 

that has been cut with both Ava II and SsuJ_6 1 and j-. ? . 

35 purified by HPLC or po 1 yccry lam tie gel electrophoresis £ .* 

(PAGE). Cells are transformed with the ligatcd DMA. \. '* 

DNA from colonies selected on- L3 agar with ampicillin £ 
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j - 

is screened by restriction digestion. The desired 

construction can be cut with Rsr II or Acc I, but not £. ■ 

by any of the enzymes listed in Table 18. Plasmid DNA ^"'-v 

from colonies that have the predicted restriction D 
5 digestion is sequenced in the region of the insert to 

verify the construction. This construction retains ^-" >? 
both the Ay a II and the Bsu3 6 I sites. The resulting 

L- . ; 

construct is called pLGl. f,..-^ 

10 The plasmid pLGl is grown by standard techniques F- V .V- 

and ONA isolated and cut with both Aat II and Acc I. fin — 

After ligation , there will still be Aat II and Acc I .>.- 

restriction sites at the ends of the inserted ONA. The \~-~ : "- 
Aat 1 1 - to- Acc I fragment of pBR322 is ligaied to the 

15 backbone of LCI. The ligated DNA is used to transform j-.. .) 

competent col i that are plated on a.-picillin- K 
containing platas after a short grcw-out. 



Anpici llin-resistant colonies are picked. Plasmid 



20 DNA oC the phagem'id from the resistant colonies are jr 



digested wich Br>u3 6 I and Rsr I. To verify the 

construction, DMA from phagemids with the correct [ 

restriction digestion pattern is sequenced: a) fron f ". 

about 20 tascs above the Bsu3^ I site to about 20 bases »*..- 

25 below the Rsr I site, and b) Cor about 30 bases either t- ■ ■■ 
'side of the unique Ava II site. The correct construct 

is named pLG2. I* 

The Acc I restriction site is no longer needed for '. 
30 vector construction. To eliminate this site, RF pLG2 - 
dsDNA is cut with Acc I, treated with Klcr.ow fragment 

and d ATP and dTTP to make it blunt and then religated. ^ vf. 
The ligated D:?A is used to transform competent cells; | dr- 
after a short grow-out, ar.pic i 1 1 in-res istant colonies P'' 1 : 
35 are selected. Restriction digestion is used to screen [V'i 
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phagemid DNA from these colonies; the desired product ^ -. 

cannot be cut with Acc I. To verify the construction, 
DNA fron colonies lacking an Acc I restriction site is 
sequenced froa about 20 bases above . the fomer Acc I 



outside of M13, ue insert DNA that codes for nature 
BPTI after A23 of the precoat protein of M13. Mature 
BPTI begins wich an arginine residue, which i«? charged; 
cleavage by signal peptidase I is normal in such cases. 
Signal peptidase I (SP-I) cuts a chimera of M13 coat 
protein and 3?TI after A23 leaving nature BPTI attached 
at its cortcxy end to the amino teminus of M13 CP. 

The following aninc-acid sequence, called AA_seq2, 
is constructed, by inserting the sequence for mature 
BPTI (shown underscored) immediately after the signal 
sequence of M13 preccat protein (indicated by the 
arrow) and before the sequence for the M13 C?. 



site to about 20 bases below it. The cloning vector, 
named pLG3, is now ready for stepwise insertion of the 

osp-jpbd gene. ['X'^ 

We are now reaoy to design a gene (See Sec. <) 
that will cause BPTI -domains to appear on the outer 
surface of an M13 derivative: LG7. f.. : ^ 



To obtain a novel protein domain attached to the t'/--? 



190 p- 
AA seq2 

1 1 2 j | 2 3 3 4 4 5 C/; 

5 0 5 0 V5 0 5 0 5 0 



MKKS LVLKA5VA7ATLV Ptt L5 FA ? PPFCLfc'PPYTC FCKAR I IftYFYHAKA 



6677 8 399 10 

050505050 



10 CLCOTFVYCCCRAKRN^JFKS A rDC'^P-TCCGA /NECODPAKAAF N5 LOASAT ('..";"" 

101111121213 fe; 

5 0 5 0 5 0 H . 

15 EYIGYAWAMVWIVCATIC I KLFKHFTSKAS ^fe 



We adopt the convention that sequence numbers of 
20 fusion proteins refer to the fusion, as coded, unless 
otherwise noted. Thus the alanine that begins M13 CP 
is referred to as "number 32", "nur.ter 1 of M13 CP", or 
"number 59 of the mature 3PTT-M13 CP fusion". p 

I 

25 The osc-jnbd ' gene is regulated by the lacUV: £ 

promoter, so that the level of expression can be L 
regulated by the concent ra c ion of IPTC supplied in the 
growth medium. (See Sec. . 1 ) . The host strain of E . 
col i should harbor the I c 1 3 gene that represses the 

30 lacUV5 promoter to a greater extent than 1 n I * . The 
csd- iobd gene is ended by the t ro attenuator so that 
RNA polymerase will not read through into subsequent 
genes. The oso- iobd gene is expressed and processed in { 
parallel with the wild-type qcne V T 1 1 . The novel 

35 protein, that consists of UV'TI tethered to a M13 CP 

domain, constitutes only a traction of the coat. \. 

Affinity separation is able to separate phage carrying [: 

only five or six copies of a r;oiecule that has high f 

affinity for an affinity matrix (SMITE 5) ; 11 j 

J. 

40 incorporation of the chimeric protein results in about £ 
30 copies of the protein exposed on the surface. If ^ 
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The ambiguous DKA sequence coding for AA_seq2, 
shown in Table 3, is examined by PPOSPECT for places 
35 where recognition sites for any of the enzymes listed 
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this is insufficient, additional copies may be ; ^ . 

provided . &}>'- 

e 

Figure 9 shows, in stereo, a hypothetical model of 
5 a short segment of the coat of a derivative of M13 ir. |-.^-* 
which some coat protein ncnor.ers are fusions of nature 

BPTI to the aninc terminus of the ncrr.al Ml 3 CP. The p--v 
figure shows only protein C a i 0 | U s; the DfJA, r.ot shown, hi.'* 
lies inside the cylinder. The mcdol of K13 coat is 

10 after the model for fl of Mar/in and colleagues 
(BANN81). The BPTI domain is taken from the Brookhaven 
Protein Data Bank entry "6PTI" and was attached by 
standard model building methods that • insure that 
covalent bond lengths and angles are close to ?- - 

15 acceptable values. The space between the alpha helical 

main chains is filled by protein side groups so that ^ .v{ 

the DNA is protected frora solvent. The figure is not 
meant to suggest that BPTI fused to M13 CP will adopt 
the conformation shown, which is arbitrary. Rather the fc. 

20 model shews that the fusion protein could fit into the - J\ - 

supramolecuiar structure in a stereochemical^ 
acceptable fashion without disturbing the internal £ ^ 

i* 

structure of either the H13 CP or tiPTI domain. 

: ; 

2 5 m The osn-i nbd gene will use: a) the lacUVS I 

promoter, b) a Shine-Da 1 ga rr.o sequence having high j- 
homology to natural Sh ine-Ea lga rno sequences, c) a 
completely synthetic coding region having codons 
assigned to optimize placement of restriction sites, 

30 and d )- the trp attenuator as transcriptional £v. 
terminator. (See Sees. . 1 and 4.2). 
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in Table 21 could be created without altering th« 
amino-acid sequence. (See Sec. 4.3). A master table 
of enzymes is' compiled from the catalogues of enzyne 
suppliers listed in Table 4. The enzymes listed in 
5 Table 21 are those that do not cut ' the OCV, the 
construction of which is described above. The codes 
used in the. ar.biguous DNA, are shewn in Table I. 

Using the procedure given in Sec. 4.3, we design a 
10 jpbd gone, such as that shown in Table 22 and in Table 
2 "J . The recognition sequences of commercially 

available enzymes that recognize five or ntore bases are 
shown in Table 4. Some of these enzymes ( e . c . Ban I or 
Hoh I) cut the OCV too often to be of value. A summary 
15 of restriction sites in Mie designed r-bd gene are given 
in Table 24 . 

The entire OtfA sequence of th2 ml2c3- ' b:) r . i fusion 
vith annotation appears in T?blc 25 shoving the useful 
20 restriction sites and biologically important features, 
v iz. the It-j'.'JVS promoter, the 1 acQ operator, the Shine- 
Dalgarno sequc-nee, the amino acid sequence, the stop 
codons, and the .transcriptional terminator. 

25 ' The i obd gene is synthesized in several steps 

using the method described in Soc. f ■ . 1 , generating ds 
CNA fragments of 150 to 190 base pairs. In this 
example, the 3' overlap window (N W J is set to run from 
23 to 27 which is generous. The end spacers (U s ) that 

30 are added to insure efficient digestion are set to e, 
which is also generous. Syntheses designed with 
sxaller overlaps and shorter spacers would allow longer 
fragments of dsDNA to be synthesized and consune less 
of the reagents. Mote, however, that Oliphant and 

35 Struhl (OLIP37J required large excesses of restriction 
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enzyr.es seant to cue near the ends of their dsCNA; this 
could have been because they had set N s =2 . 

All DNA synthesis and purification is done by 
5 standard methods as described in Sec. 5.2. 

The four steps (See Sec. 6.1) by which we clone 
synthetic fragments of the riUcp-bctj gene (Che oso^ 
jpbd qene of the present exanple) into pLC3 and its 
10 derivatives are illustrated in Figure 20. x 
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35 



The sequence to be introduced into pLG3 is shown 
in Table 26 and in Table The segment is 158 bases 

long .and is synthesized from two shorter synthetic 
oligo-nts as described ir. Sec. 5,1 of the generic 
specification. The inport; : r.t features of this segment 
are five restriction sites, the lacUVS promoter, a 
Shine-Dalgorno site, and the TrpA attenuator as shown 
in Table 26. 

Table 27 repeats the ar.ti-sense strand shown in 
Table 26. The 99 brfse fragment shewn in upper case 
letters and underscored ( 5 ' -CCGTCC. . .. . CCTTCG-3 ' = 
oligjj) . is synthesized in the standard manner. 
Similarly, the 100 base Icr.g fragncr.t of the sense 
strand shown in lover c^d ( 5 ' -egctea . . . . aa ttg-3 ' » 
^oligi4j is synthesized. Artcr annealing, the double- 
stranded region is extended with Klcncw fragnent by the 
procedure given above to r.afce the entire 176 bases 
double stranded. The overlap region is 23 base pairs 
long and contains. XI CC ??.irs and 0 AT pairs. The DNA 
between Avr II and Ar.u II -"*s not code for anything in 
the final pbd gene; it is there so that the DMA can be 
cut by both Avr II and A:--.: II at the same tine in the 
next step. This spacer v.is rnade rich in C and G so 
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that anneal ing'af the two s ing 1 c-stranded DNA fragments 
will be efficient. Eight bases have been added to the 
left of Rsr II and nine bases have been added to the. 
left of 5au I (same- specificity and cutting pattern as 
. 5 Bsu36 I). These bases at the ends are not part of the 
final product; they must be present so. that the 
restriction enzymes can bind and cut the synthetic DNA 
to produce specific sticky ends. 

10 The synthetic DNA is cut with both Sau I and 

Rsr II and purified by HPLC or PAGE. RF pLC3 is cut 
with 5au I and Ava II and purified by KPLC or agarose 
gel electrophoresis and . electroe lu t ion . The large 
piece from the phagemid and the synthetic DNA arc 

15 ligated and used to transform E^ col i ; Ampicillin- 
resistant colonies are obtained and plasnids are 
screened by endonuclease digestion of RF phagenid DMA. 
The desired product can be cut by Avr II. Asu II, or 
BstE II, but the original phagenid car. not be cut by 

20 any of *-hese enzymes. To verify the. insert; DMA from 
isolates that have the correct restriction sites is 
sequenced from about 10 bases above the Sau I site to 
about 10 bases bclcw the Rsr II site. -The construct 
with the correct insert is called pLG4 . 

25 

The second step of the construction of the OCV is 
Illustrated in Tables 28 and 29. This second segment 
of DMA is 155 bases long. As in the construction of 
pLC4, two pieces of single-stranded DNA are 

30 synthesized. A 99 base long fragment of the anti-sense 
sr.rand ( 5 ' -CCACCA . . . . CGTCCC-3 ' = olig?5) is shown in 
upper case letters and underscored; the other piece of 

99 bases (5'-gatcta atcacct-3' = olig=6) is shown 

in lower case and is a fragr-cnt of the sense strand. 

35 These strands arc complementary over 24 bases, 
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containing 14 CC bai;e pairs and 10 AT base pairs. 
Klencw fragment is usod to extend in both directions to 
produce tls DNA. Doth the synthetic dsDNA and RF pLC-; 
QUA are cot with both Avr II and Asu II and purified by 
5 HPLC or the appropriate type of gel electrophoresis. 
The backc*one from the phage.-id pLG4 and the synthetic 
DNA are ligated and used to transform E_^. co\ i . 
Ampiciil in-rcsistant colonies arc obtained and plasmids 
are screened by restriction digestion. The desired 

10 product can be cut by any of A f I II, F.'he I, ffru I, Kfin 
I, Ace ill, Ava I . Xho I , Pf !M I. Afia I, Dra II, Pss I, 
or BssH I while pLC4 can not be cut by any of these 
enzymes. To verify the insert, DNA from phagemids with 
the correct restriction sites is sequenced from about 

15 10 bases above the BstE II site to about. 10 bases be lew 
the Avr II site. The construct carrying this second 
insert is called pLC5. 

Construction of pLC6 proceeds similarly to the 

20 construction of pLC5. The sequences are shown in 
Tables 30 and 31. The two single stranded segments 
(otig = 7 and olig=3) are synthesized, annealed, ar.d 
extended with XI enow fragment. The overlap region 
comprises 25 base pairs, 15 CG and 10 AT. Both the 

25 synthetic DUA and R? pLC5 are cut with both RssH I and 
Asu II, purified, and the appropriate pieces are 
ligated and used to transform E^. col i . Ampicillin- 
resistant color.ies are obtained ar.d plasmids are 
screened by restriction digestion. The desired 

3 0 phage^id can be cut with any of Stu I , Acc I, Xca I, 
Esq I, >:.-» Ill, f.r-h I, Bbe I, or » ?ar I, while pLC5 can 
not be cut by any of these enzymes. To verify the 
third insert, D.'.'A from phagemids with the correct 
restriction map is sequenced from about 10 bases above 

35 Asu II cite to about 10 bases below the DssH I site. 
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The construct with the correct third insert is called 
pLC6. 

The construction of pLC7 is illustrated in Tables 
5 32 and 33 and proceeds similarly to the constructions 
of p'LG4, pLG5, and pLG6. The two single stranded 
segments (olig=9 and oligJLO) are synthesized, 
annealed, and extended with KLenow fragment. Both the 
synthetic DNA and ?S pLG6 are cut with both Gbe I and 

1° Asu II, purified, and the appropriate pieces are 
ligated and used to transform col i . Ampicillin- 

resistant colonies are screened by restriction 
digestion of phagemid R" D*f A . The desired phagemid can 
be cut with any c: Sf± I, Hind III, Klu I, 3stX I , or 

15 Nco I, while pLC6 can be cut by none of these enzyr.es. 
To verify the fcurth insert, DNA from pnagemids with 
the correct restriction sites is sequenced front about 
10 bases above the Asu II site to about 10 bases below 
the Sbe I site. The construct with the correct fourth 

20 insert is called p 1C ? ; the display of SPTI on the cuter 
surface of LG7 is verified by the methods cf Sec. 3. 

Ml3am-;29 is an ar.be r -utation of Ml 3 used to 
reduce non-specific binding by the affinity matrix for 
25 phages derived from :!13. K13.ts-;29 is derived by 
standard genetic methods (X:LL7 2J fron wtM13. MlJ.^-4 2 9 
'is grown on EL. col : r- t ra i n r£3S3(F + , SupZ, Rec~, A,T.r s ) 
and harvested by the standard method. 

30 Phage LG7 is grown on ^ col i strain PE3S4 in LB 

broth with various concentrations of IFTC added to the 
medium to induce the oso- Isbd gene. Phage LC7 is 
obtained from cells grown with 0.0, 0.1, 1.0, 10.0 or 
100.0 u>:, or 1.0 m.'! IPTC, harvested (See Sec. 7) by the 

35 method of Salivar ( J ALI G s ) , and concentrated to obtain 




a titre o: 
(MESS83) . 



10 



12 



pt'u/nl by cne uatfcod of Kcssi-q 



The preferred' =ethod of determining whether LC7 
displays SFTl'or. its surface (See Sec. G) is to 
determine whether these phage can retain a labeled 
derivative of trypsin (trp) or anhydrotrypsin (AHTrp) 
on a filter that allows passage of unbound trp cr 
AHTrp. Trypsin contains 10 tyrosine residues and can 
be iodinated with by standard nethods; we denote 

the labeled trypsin as "trp-. Labeled a.,hydrotryps:n 
is denoted as "AHTrp* " . Other types of labels can be 
used on trp or AHTrp. e_^ biotin or a fluorescent 
label. AHTrp* or trp- is labeled to an activity of 0.1 
uCi/ug. A sample of 10 12 LG7 ( 10 n* IPTGi is nixed with 
1.0 ug of trp- or AHTrp - * in l.o'.l of a buffer of 10 =M 
KC1. adjusted to pH S.O with 1 SUM K 2 HPO„ / EI 2 P0 4 . The 
fixture is. passed through an Anicon M3?l syster, fitted 
with a ces'orane filter that allows passage o: proteins 
SDaller that « r = :00.000. Filters are soaked ir. 
buffer containing tr? or AHTrp prior to the analysis. 
The filter is washed twice with 0.5 r-i of buffer 
containing trp or AKTrp. The radioactivity retained or. 
the filar is quantitated with a scintillator, counter 
or other suitable device. If each virion displays cr.e 
dopy of BPTI, then .25 ug of protein can ^.e fcour.d tr-.as 
would give rise to 3 x 10* disintegrations / ai.-.ute on 
the filter. 



i 



An alternative way to quantitate display of 3?;i 
on the surface of LG7 is to use the stoichiometric 
binding between trypsin and BPTI. to titrate the BrTI. 
A solution that titers 10» pfu/.l of a phage is 
approximately 1.6 x 10" 8 M in phage if each virion is 
infective. The ratio of pfu to total phaoe can be 
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determined spcctrophotometr ical ly usinu the molar £ 

extinction coef f icienCs at 260 nm and 280 nm corrected [• 

for the increased length of LG7 as compared to wtM13. £ 
For example, if. a 1.0 ml solution that contains 10 12 

5 pfu of LC7 phage grown with 1.0 mM I?TG inhibits ■ ; 

trypsin solutions up to 4.3 x 10" 7 M, we calculate that £ 

there are approximately 30 DPTIs/GP ( i.e. (:.8 x 10~ 7 t; 

molecules of DPTI/1)/(1.6 x 1C" 8 phage/1)). Inhibition t 

of a speci f ied concentration of trypsin is nose easily \, t 

10 measured spect rophotone trica 1 ly using a peptide-1 inked j| 

dye, such as N alpna -ben2oyl-Arg-Nan (TSCH'7). t" 

Alternatively, binding to an affinity column nay [> 
be used to demonstrate the presence of BPTT on the 
15 surface of phage LC7 . An affinity colur.n o: 2.0 ml 
total volume having BioRad Affi-Cel lof 1 ^ matrix and 
30 ng of AHTrp as affinity material is prepared by the 



r 

method of BioRad. The void volume (V v ) of this column 

is, by hypothesis,. 1.0 mi. This affinity column is fa 



20 denoted (AHTrp), 



from 10 rcM to 2 H in 3 x V v , buffered to pH 8.0 with 
35 phosphate is passed over the column. The first KCl 



A sample of 10 12 M13am429 is applied to (AHTrp) in £ 

1.0 ml cf- 10 mM KCl buffered to pH 8.0 with KH : ?0 ; / [.V" 

K2HPO4. The column is then washed with the sar.e buffer [- 

25 until the optical density at 280 nm of the effluent i : 

'returns to base line or 4 x V v have been passed through [*; 

the column, whichever comes first. Samples of LG7 or \- 

P- 

LC10 are then applied to the blocked (AKTrp) column at jL. 

1C 12 pfu/ml in 1.0 ml of the same buffer. The column J* 

30 is then washed again with the same buffer until the ^ 

optical density at 250 nm of the effluent returns to f- 

base line or 4 x V v have been passed through, whichever £. 

comes first. Following this wash, a gradient of KCl j^. 
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gradient is followed by a KC1 qradient running front : M 
to 5 M in 3 x V v . The ccccr.d KCl grad i cr.t is followed 
by a gradient of guanidiniua CI frora 0.0 K to 2.0 M in 
2 x V v in 5 M KC1 and buffered to pH 3.0 with 
phosphate. Fractions of 50 ul are collected and 
assayed for phage by plating A ul of each fraction at 
suitable dilutions cn sensitive cells. Retention of 
phage on the colur-n is indicated by appearance of LG7 
phage in fractions that clute significantly later from 
the colunn than control phage LG10 or wtM13. A 
successful isolate of LC7 that displays BPTI is 
identified, the bp t i insert and junctions are 
sequenced, and this isolate is used for further vork 
described below. It is likely that a significant 
fraction of clonal isolc.es t'ron the sace ligation that 
are characterized an identical by restriction digestion 
will sinilarly display BPTI. 

.If vgDNA is used to obtain a functional fusicn 
between a EFT I cutar.c and Ml 3 C? f v ide i nfra ) , then DWa 
from a clonal isolate its sequenced in the regions that 
were variegated.-' Then gratuitous restriction sites for 
useful restriction enzyr.es are removed if possible by 
silent ccdon changes as follows. A de novo piece of 
synthetic QUA is synthesized such that the selected 
amino acid sequence is preserved and clcned into pLC7. 
The sequence numbers of residues in OSP-IPBD will be 
changed by any insertions; hereinafter. we will, 
however, denote residues inserted after residue 23 as 
23a, 23b, etc . Insertions after residue 81 will be 
denoted as 81a, 31b, etc . This preserves the numbering 
of residues between C5 in BPTI and C55 in BPTI. 
Residue CS of BPTI is always denoted as 28 in the 
fusion; residue CSS of BPTI is always denoted as 78 in 
the fusion, and the intervening residues have constant 
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Should LG7 phage from cells grown wich 10 mM IPTC 
fail to display BPTI on its surface, we have several 
5 options. We night try to dcternine why the 
construction failed to worK as expected. There are 
various possible nodes of failure, including : a) BPTI 
is not cleaved from the Ml 3 r. ignal sequence, b) BPTI is 
cleaved from the M13 . CP , and c) the chimeric protein is" 

10 nade and cleaved, after the signal sequence, but the 

processed protein is not incorporated into the M13 JflS 
coat. BPTI has been secreted from col i (KARK86) ; ™ 
however the H13 coat-protein signal sequence was not 
used. Therefore problems sterling fron the signal 

15 sequence are unlikely, but possible. We could I 
determine whether BPTI was present in the periplasm or f_ 
bound to the inner ner.brane of LC7-infected cells by 
assays using labeled .trypsin or anhyd rot ryps in . 

u 

20 Proteins in the periplasm can be freed through ^' 

spheroplast fomaticn using lysozyme and EDTA in a f 
concentrated sucrose s-lution (BIRDS7,- MAL;»C4 ; . If 
BPTI were free in the poripiasr. , it would be found in t 
the supernatant. Trypsin labeled with 12 5 1 would be F.- 

25 nixed with supernatant and passed over a non- ; 
denaturing colecular sizing colunn and the radioactive / 
'fractions collected. The radioactive fractions would L 
then be analyzed by S 03 -PACE and examined for BPTX- 
sized bands by silver staining. 



30 a 

Spheroplast forr.ation exposes proteins anchored in 
the inner .?.e.T.brane . Spheropiasts would be nixed with ^; 
AHTrp* and then either filtered or centrifuged to. 
separate then from unbound AHTrp*. After washing with ^ 
35 hypertonic buffer, the spheropiasts would be analyzed 
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for extent of AKTrp* binding. 

If BPTI were found free in the • per iolasn, then we 
would expect that the chimeric protein was being 
5 cleaved both between BPTI and the M13 nature coat 
sequence and between BPTI and the signal sequence'. In 
that case, we should alter the BPTI/M13 CP junction by 
inserting vgO:JA at codons for residues 73-82 of 
AA_seq2. 

10 

If BPTI were found attached to the inner raembrane, 
then two hypotheses can be formed. The first is that 
the chir.eric protein is being cut after the signal 
sequence, hut is not being incorporated into LG7 

15 virion; the treatment would also be to insert vgPNA 
between residues 78 and 82 of AA_seq2 . The alternative 
hypothesis is that BPTI could fold and react with 
trypsin even if signal sequence is not cleaved. U- 
terminal aair.o acid sequencing of trypsin-bindir.g 

20 material isolated r'ron cell homogenate determines what 
processing is occurring. If signal sequence were being 
cleaved, we would use the procedure above to vary 
residues between C78 and A82; subsequent passes would 
ado r--ici.;es a f ter - res idue 81. If signal sequence were 

?5 ,not beinc, cleaved, we would vary residues between 23 
and 27 of AA_seq2. Subsequent passes through that 
process would add residues after 23. 

If BPTI were found neither in the periplasm nor on 
30 the inner membrane, then we would expect that the fault 
was in the signal sequence or the s igna 1-sequence-to- 
BPTI junction. The treatment in this case would be to 
vary residues between 23 and 27. 
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Analytical experiments to determine what has gone 
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wrong take time and effort and, for the foreseen 
outcomes, indicate variations in only two regions. 
Therefore, we believe it prudent to try the synthetic 
experiments described below uithout doing the analysis. 
For example, these six experiments that introduce 
variegation into the bot i -none VTI I fusion could be 
trie 

1) 3 variegated codons between residues 7a and 62 
using oligi*12 and olig*l3, 

2) 3 variegated codons between residues 23 and 27 
using olig*14 and olig*15, 

3) 5 variegated codons between residues 78 and 82 
using olig*13 and olig$12a. 
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4) 5 variegated codens between residues 23 and 27 
u^ing olig;15 and olig.*14a, 

5) 7 variegated codons between residues 78 and 82 
using olig*13 and oligillb, and 

6) 7 variegated codons between residues 23 and 27 
using olig*15 and oiigiliab. 

To alter the BPTI-M13 CP junction, we introduce 
DNA variegated at codcr.s for residues between 7S and 82 
into the Snh I and S f i I sites of pLG7. The residues 
after the last cysteine are highly variable in amino 
acid sequences homologous to DPTI, both in composition 
and length; in Table 25 these residues arc denoted as 
C79, C30, and A81. The first part of the M13 CP is 
denoted as A32, E83, and C84. One of the oligo-nts 
olig=12, olig'12a, or olig*12b and the priner olig?13 
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are synches ized . by standard methods. The oligo-ncs 
are: 

residue 75 76 77 73 79 80. 81 32 83 
5' gc[gagicCCjATC|CCTiACC|TCCiqfV:[qfkiqfk|CCT|CAA|- 

84 35 36 87 e3 SO 90 91 
CCT;CATiCATlCCC|CCCiA,*vA|CCCiCCCigcgicC 3' oiig'12 

residue 75 76. 77 73 70 SO 81 013 31b 
5' gcjcjaq|cCCiATG|CCT|ACC|7CC|q:Jc!qfk|q£Jc|qfk|qfIc|- 

82 83 e4 85 86 "7 
CCT ! GAA 1 CGT | CAT | GAT J CCC | - 

83 89 90 91 
CCC|AAAiCCCjCCC|gcg|cc 3' olig=l2a 

-esidue 75 76 77 78 7? 80 81 21s Sib 
5' gcjgag|cGC!ATG|CGT| ACC | TGC | q f k ) q f k ' q fk | q fk | qfk | - 

21C 31d 82 . 33 84 35 SG 87 
qffcjqffc| CCT i CAA | CCT [ CAT j CAT j CCC J - 

• 53 '89 *>0 9 1 
CCC | AAA; CCC | CCC J gcg | cc 3' 0lig=l2b 



residue" 01 90 SO 83 37 3ft 
5' gg; cgc|GCC!CCC|TTTiCCC|CGG| ATC 3' o!i(j*13 

where q is a -ixture of (0.26 T , 0. ISC, 0.20 A, ani 
0.30 GJ , f is a nixture of (0.22 T, 0.^6 C, 0.40 A, and 
0.22 G) , and k is a mixture of equal parts cf T and C. 
The bases shown in lover case at either end .'.re spacers 
and are not incorporated into the cloned qerre. The 
^ri-er is cor.pienentary to the 3' cr.d of each of the 
longer oligo-nts. One of the variegated oligo-nts and 
the pri.-er clig=13 are combined in equine lar a.-ounts" 
and annealed. The dsOS'A is cor.pLeted with ali fcur 
(nt)TPs and Kleno-r fragment. The resultinq dsDN'A and 
RF pLC7 arc cut with both Q_Q I and Soh I. purified, 
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nixed, and ligated. This ligation nixture goes through 
the process described in Sec. 15 in which jc select a 
transformed clone that, when induced uiuh IPTC, binds 
AHTrp. 

5 

To vary the junction between M13 signal sequence 
*nd BPTT, wo introduce C*:a variegated at colons for 
residues between 2 3 and 27 into tne Kon I and Xho I 
sites of pLC7. The first three residues are wiqhly 
10 variable in anino acid sequences hcaologous to 3PTI. 
Homologous sequences also vary in length at the anino 
terminus. One of the oligo-nts olig = K, clig?14a, or 
olig = 14b and the pr irr.cr olig = 15 are synthesized by 
standard methods. The oligo-nts are: 

15 

residue : 17 13 19 20 21 22 23 24 25 

5 ' g | gec | gcC | GT.\ | CCC | ATC | CTC | TTT " TTT \ CCT | q £ k i q : * j - 

20 2 6 2 7 23 29 30 

!qfk|TTC;TGT!CTC;CAC|cgciccg!cqa' Z ' olic-14 

residue 17 13 10 20 2 1 22 23 24 ."5 25 

r > ' g | gec J gcC | GTA j CCC j ATC I CTC ! TCT j TTT ; CCT [ q £k • q 1 k \ cfk j - 

26a 26b 27 28 29 30 
j qf k | qf k | TTC ; TC 7 • CTC | GAG j cgc j ccg j cga j 3 ' o I ig Ha , 

30 



res idue 17 13 19 2 0 21 2 2 2 3 2 A 2 6 

5 ' g ; gec | gcG ; CTA J CCG j ATC | CTC \ TCT I TTT \ CCT ■ q f k \ q i k : q i k \ - 

35 

26a 26b 26c 26d 27 23 29 30 
j qfk|qfk|qfkjqfk!TTC|TG7| CTCj GAG ; cgc ^-jcg | cga j 3 *olig--Hb 

4 0 5' |tcg|cgg|gcg|CTC|GAG|ACA|CAA! Z' dig-* IS 

where q is h nixture of (0.26 T, 0.13 C, 0.26 A. and 
0.30 GJ , f is a nixture of (0.22 T, 0.16 C, 0.^0 A, and 



0.22 C) , and k is a mixture of equal parts of T ar.d C. 
The bases shoun in iowRr case at either end arc spacers 
and are not incorporated into the cloned qene. One of 
the variegated oligo-nts and the priner are conbined in 
5 equiaolar ariounts and annealed. The ds Dt'A is 
completed with all four (nt)TPs and Klenow fragment. 
The resulting dsDf/A and RF pLC7 are cut with both Son t 
and Xho I f purified, nixed, and ligated. This ligation 
mixture goes through the process described in Sec. 15 
10 in which we select a transferred clone that, when 
induced with IPTC, binds AHTrp or trp. 

Other nucbers of varieqated codons could be used, . 

15 If none of these approaches produces a working 

chimeric protein, we r.ay try a different sicnal 
sequence. If that doesn't work, we nay try a different 
OSP in M13 because the structural data clearly indicate 
that BPTI could net be joined .to the carboxy teminus. 

20 The next best OS? of ;<13 is the gene III protein 
because there is fusion data (SMITS5, CRU288) . 



£x*pple \ . Part II 
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BPTI binds very tightly to trypsin 
(K^ = 6.0 x io" 1 - 4 M) and to anhydrotrypsin, so th>t 
these colccules are not preferred for optimizing the 
anount of BPTI to display on LG7 or the amount of 
30 affinity molecule to attach to the column. Tschesche 
et a 1 . reported on the binding of several BPTI 
derivatives to various proteases: 
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Dissociation constants for riPTT derivatives, Mclar. 

Elastase Elastase 
(porcine (hucan 
pancreas) leukocytes) 



Residue 
115 



Trypsin Chynotrypsm 
(bovine (bovir.c 
pancreas) pancreas) 



lysine 
glycine 
alanine 
va 1 ine 
leucine 



6.0 x 10 



-14 



9.0 x 10 



-9 



3.5 x 10 



-6 



7.0 >: 10" 



2.8 x 10" 



2.5 x 10 



5.7 x 10 



-3 



1-. 1 xlC 



-10 



1.9 X 10" 3 2-9 x 10* 9 



From the report of Tscheschc et a 1 . ve infer that 
molecular pairs narked have K d s greater than 

3.5 x 10~ 6 M and t.^.it molecular pairs narked n ~ m have 
K d s much greater than 3.5 x 10 -6 M. Because of the 
wealth of data about the binding of RPTI and various 
mutants to trypsin and o:h:r proteases (TSCH37), we can 
proceed in various ways.- (Tor other PSDs we can obtain 
two different monoclonal antibodies, one with a high 
affinity having .of order 10" 11 M, and cnu with a 

moderate affinity having K d cn the oraer of 10"° *'..) 
rn this example, *-c -ay use: O the moderate binding 
between BPTI and hur.an leukecyre elastase (HuLEl). b) 
the moderately strcng binding of porcine elastase to 
BPTI(VIS), or c) the binding of BPTItAlS) (reside? 26 
in the fibd gene) tcr trypsin (weak but catectablel =r 
for porcine pancreatic elastase. 

Following the teachings of Sec. 10, we co.-para the 
retention of LG7 virions to the retention of wild-type 
HID on { AHTrp ) . Ml 3 derivatives having no re CNA than 
wild -type Ml 3 hava corresponding longer virions. Thus 
we will create p!-C3 that differs fron pLC7 only in 
having stop codons at codons 2 and 3, and an altered L 
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codon at ccdon 7 of the csp- i sbd gene. Phage LGS will 
have exactly as much DMA as ICl ; therefore the LC3 
virion is exactly as long as the LG7 virion. LGS can 
not, however, display BPTI on its surface. To generate 
5 these mutations we synthesize the oligo-nt 

5' (121) | aac[gctiagc|ctt|Cag|aacjcag|aga! ttA 1 ctA | cat | - 
10 |agtigag|cct| (SO) 3 ' olig=ll 



0 



that is complementary to bases 80 through 121 of the 
jpbd gene, shown in Table 23 , except for the three 

15 upper-case, underscored bases. 01ig=ll and the 
priners olig.^24, olig=25, and oligs?6 are annealed to 
circular ssDNA from LG7 . Klenou fragment (from US 
Biochemical) and all four (nt)TPs arc used to complete 
the circular dsONA. After treatment with Klenov 

20 fragment, the dsONA is treated with ligase. .Cells are 
transformed with the ligated dsCNA and, after a short 
growout, the cells are plated on amp ic i i 1 in-containing 
LB agar. By changing the third base in codon 7, ve 
have destroyed the u::ique Af \ II site ir. pLC7. Thus -e 

25 can screen colonies for loss cf the A f 1 II site. To 
confirm the construction, C:«'A from plaques with nc 
Af 1 II site are sequenced £ro= about base :-;0 to about 
base 4 0 of the c sp- icbd gcr.e. 

30 To expedite identification of different M13- 

derivod phage, we replace the oro R gene o: LG3 with the 
tot R gene from p2R322. Plasmid pBR322 is cut with 
Dsn I at the unique site at 1353 And the linearized TNA 
is blunted with Klcncv fragment and purified. The 

3 5 blunt DNA is cut with Ant II and the 1-: 23 -base tet R - 
bearing fragment purified by agarose gel 
electrophoresis or HPLC. PI -3 sr. id pLC8 ds DNA is cut 
with Xb3 I at the unique site and the linearized Cf.*A is 
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blunted with Klcnow fragment c\r.d purified. The lir.ear, 
blunt DNA is digested with A .it II and the 7.3 kb 
fragment is isolated. The two isolated ON A fragr.e.ncs 
are mixed, annealed, lighted, and used to transfers 
5 competent col i cci Ls\ The transformed cells are 

selected with tetracycline. The correct construction 
contains SnA I. EcoP. r, and £coR 7 sites, but LOS 
contains none of these. The correct construction, 
having 9:2 kb, is easily distinguished from ar.d 
10 is called LC10. 0;iA from phage LC10 is sequenced in 
the vicinity of the junctions of the newly inserted 
tet R gene to confirm the construction. 

The phage LC7 is grown at various levels o: I?TC 
15 in the medium and harvested in the way previously 
described. An affinity colur.n hoving bed vclur.e o: 2.0 
ml and supporting an amount of HuLEl picked fron the 
range C.l mg to 3 0.0 ng cn 1 ml of BicP.jd A;:"i- 
Cel 10 C 1 " 1 or Affi-Cei 15 f™ 1 is designated ' t H-L^li. 
2(j An appropriate set of densities of ituLCl on tr.c .;cl'_-r. 
is (0.1 -g/ml, 0.5 mg/nl, 2.0 mg/ml, 3.0 :?q/ml, 12. 0 
mg/ml, and 3 0.0 mg/ml). The V v of {KuLSi! is, by 
hypothesis, 1.0 ml. The elution of LOT pr.ace is 
compared " to the eluticn of LC10 on (iiuLEli having 
25 varying amounts -of liuLEl affixed. The colwns are 
eluted in a standard way: 

1) 10 mM KC1 buffered to pH 3.0 with phosphite, 
until optical density at 2S0n~ falls to base lire 

3 0 or 4 x* V v , whichever is first, 

2) a gradient of 10 mM to 2 M S'Cl in 3 x v v# 
held at S.O with phosphate. 
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3) a gradient of 2 M to 5 M XCl in 3 x V v . 
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phosphate buffer to pH 3.0, 



4) constant 5 K KCl plus 0 to 0.8 M guanid in iur. CI 



•in 2 x V v 



with phosphate buffer to pH 8.0. 



The preferred level of induction ( I PTC opt i ma i ) and 
amount of affinity molecule on the matrix 
{DoAuMoH C p t i ral } are those settings that .give the 
sharpest LC7 elution peak that shows significant 
10 retardation as compared to LC3,. which carries no BPTI. 
By hypothesis, the best separation occurs for the 
amount of BPTI/GP produced when the cells are induced 
with 10.0 llM IPTC and when 4.0 eg HuL£l/mi is applied 
to BicRad Affi-Cel lo(™>. 

15 

When the ar.cunt of EPTI/C? and the amount cf 
HuLEl/volur.e of support have been optimized, we turn to 
optimization of elution rate, initial ionic strength, 
and the a.-ount of CP/(voluze of support). These 
20 parameters can be optimized separately. 

Usir.g optimal SPTI/CP ar.d KuLEl/voluae of support, 
we measure the elution volume of LG7 and LG0 for 
different elution rates, vi z . 1, 1/2, 1/4, 1/3 and 1/16 

25 times the maximum flow rate. M13 is shear resistant, 
so that the pressure that can be appliod across the 
column is limited only by the -ecnanical properties of 
the support material. By hypothesis. 1/4 of maximum 
elution rate is better than 1/2, but 1/3 is about the 

30 same as 1/4. Therefore 1/4 -aximum elution rate will 
be used. 

Elution volu-es of LG7. obtained from cells grown 
on media that is 2.0 rJ-1 in IPTC are measured at optimal 
35 DoAMoM and elution rate for leadings "of 10 9 , 10 1 
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and :o 12 pfu. By hypothesis, io 12 pfu of pure 
LC7 overloads the column and significant number of 
phage elute before their characteristic position in the 
KC1 gradient. We also find that 10 11 pfu overloads the 
column only slightly, and that 1010 pfu does not 
overload the column. Because the use of the affinity 
separation in Sec. 15 will involve a population in 
which no single member io -ore than one part in 10 4 , we 
conclude that 10 12 pfu of a variegated population could 
be applied to a colunn of 1.0 ml matrix volume without 
overloading with respect any one species. The 
overloading of a 1.0 ml colunn by 10 12 .pfu also • 
indicates that the initial column that captures 
indiscriminately adhesive phage should be 5 to 10 tir.es 
as large as the column that supports the target 
material . 

Elution volumes of LG7 and LG10 obtained from 
cells grown on media that is 2.0 mM in IPTC are . 
measured it optimal DoAMoM and elution rate and for a 
loading of 10 10 pfu for various initial ionic 
strengths: 1.0 mM, 5.0 mM, 10.0 mM, 2 0.0 mM, and 50.0 
mM. We rind that LC10 is slightly- retarded by the 
column when loaded at 1.0 mM KCl, but that LC7 always 
comes off the column at its characteristic place in the 
gradient. We use 10.0 r.M as initial ionic strength in 
all remaining affinity separations. 



To determine the sensitivity of chromatography of 
phage that display variants of 8PTI on their surfaces 
(Sec. 10.1), we prepare artificial mixtures of two 
closely-related phage that differ only at one residue 
in the BPTI domain. One variety of phage h^s strong 
affinity for the column used in this step, while the 
other phage has no affinity for the column. we 
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chroma tog ra'ph these ni'xturcs to discover how little of 
the phage that binds to the column can be detected 
within a large majority of phage that do not bind the 
column. 

For these tests ve choose AHTrp as AfM ( BPTI) . A 
column having 2 nL bed volume is prepared with 
(0oAMoM O p t i na x mg of AMTrn)/(nl of Affi-Cel 10( T ^). 
The column is cal led . ; AHTrp } and has V v = 1.0 nl. 

A new phage, LG9, is prepared that displays 
BPTI(V15) as IPBD in contrast to LC7 that displays 
8PTI(K15, wild-type) as IPBU. Residue IS of BPTI is 
residue 38. of the oso- ipbd gene. We introduce the 
change K38 to V by replacement of a short segment of 
the oso-i pbd gene. The two oligo-nts 
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3' oligsiS 
5' olig=17 



\ Stu I 



are synthesized by standard methods and annealed; the 
lower case letters in olig = 16 and the upper case 
letters in olig?17 arc mutant with respect to pLC7. 
Plasmid pLG7 DNA is digested with both Apa I and Stu I 
and the large piece purified. The ds oligo-nt is added 
to the purified backbone of pLC7 and ligated; the 
ligated DMA is used to transform competent cells. 
After a short grow out, . the cells arr. plated on 
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anpicill in-containi ng plates and Amp R colonies are 
picked. The nutations destroy the unique SssH II site, 
thus we can screen colonies through restriction 
digestion. To confirm the construction, DMA frora 
5 colonies having the correct restriction digestion 
pattern is sequenced from about 10 bases above the 
Stu I site to about 10 bases below the Aoa I site. The 
correct construction is called pLC9 . 

10 To- expedite differentiation betveen LG7 and an 

LG9-deriva tive phage, we replace the anp R gene of LC9 
with the tet R gene fron p3R322. Plasnid pBR322 is cut 
with Bsn I at the unique site at 1 353 and the 
linearized CNA is blunted with Klenow fragment and 
15 purified. The blunt DI.'A is cut with A*t II and the 
1420-base tet R -bearinq fragment purified by agarose gel 
electrophoresis or HPLC. Plas.nid pLC9 ds DNA is cut 
with Xn.i I a: the unique site and the Linearized DNA is 
blunted with Klenow fragment and purified. The linear, 
20 blunt DS'A is digested with Aat II ar.d the 7.8 kb 
fragment is isolated. The twe isolated DN'A fragments 
are nixed, annedlcd, liqated, and used to transforn 
competent col i cells. The transformed cells are 

selected with tetracycline. The correct construction 
25 contains £a 1 I, F.coR I, and EcoR V sites, but LC9 
contains none of these. The correct construction, 
having 9.2 Jcb, is easily distinguished fro.-n pBR322 and 
is called LC11. DI.'A froa phage LC11 is sequenced in 
the vicinity the. junctions of the newly inserted tet R 
gene to confirm the construction. 

LC7 and LC11 arc grown with optir.un IPTC (2.0 nM) 
and harvested. Mixtures arc prepared in the ratios 
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where V lim ranges froia 10 10 to 10 5 by factors of 10. 
Large values of V\i n are tested first; once a is 
found that: allows recovery of LCI, .smaller values of 
Vj^ ; n are not be tested. Once a value of V\i n is found 
that allovs recovery of LCI, we test values that are 
larger by 2-, 4-, or 3-fold so that V^ ra is det*ir=iined 
within a factor of 2. 

The colunn ( AHTrp ) is first blocked by treatment 
with 10 11 virions of M12ar;429 in 100 ul of 10 oM KCl 
buffered to pH 3.0 with phosphate; the column is washed 
with the car.e buffer until OD 2 so returns to base line 
or 4 x V v have passed through the column, whichever 
coses first. One of the mixtures of LCI and LC11 
containing 10 12 pfu in 1 nl of the same buffer is 
applied to { AHTrp ) . The cclunn is eluted in a standard 
way : 

1) 10 z2i KCl buffered to pH 8.0 with phosphate, 
until optical density at 230nm fails to base line 
or 4 x Y v , whichever is first, (discard effluent), 

2) a gradient of 10 rJ-t to 2 M KCl in 3 x v v , pli 
held at 3.0 with phosphate, {30 x 100 ul 
fractions) , 

3) a gradient of 2 M to 5 M KCl in 3 x V v , 
phosphate buffer to pit 8.0, (30 x 100 ul 
fractions) , 



4) constant 5 M KCl plus 0 to 0.3 M guanidiniun Cl 
in 2 /. V v , with phosphate bu f f er to pH 3.0, (20 x 
100 u 1 f ract ions ) , 
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5) constant 5 M KCL plus 0.3 M guanid inium Cl ii. 
1.2 x V v , with phosphate buffer to pH 8.0, (12 x 
100 ul fractions) . 



Samples of 4' ul fron each fraction are plated at 
suitable dilution on pr.age-scns i t ive Sup 4 " cells (so 
that M13'ajn4"29 will not grow) . In addition Co the 
effluent fractions, a sar-pLe is removed from the column 

10 and used as an inoculur. for phage-sensit ive Sup*" cells. 
Plaques are transferred to amp ic il 1 in-conta in ing LB 
agar. Colonies that are ampicill in-resistant are 
tested for display of 3?TI(K15) by use of trp* or 
AKTrp*. Testing begins with colonies obtained by 

15 cuLturing an inoculum Cron the column, proceeds to the 
last effluent fraction, and works backwards toward 
earlier fractions. Once a positive colony is found, no 
further tests are required for that value of. v n c * If 
nc BPTI positive colonies are detected, the population 

2u of phage obtained from the column matrix and the last 
( e.g. 5 to 10) phage -bea ring fractions are aerged 
and cultured. Phage are harvested from this culture 
and chrbmatograched by the above procedure. This 
.process continues until a positive colony is isolated 

25 or N chrom passes of chromatography and growth have been 
completed. If no positive colonies are detected after 
N chrom passes of enrichment . VI im is reduced by a 
suitable factor and the process ts repeated. 
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value for which LG7 can be recovered. 



8 is the largest 
Thus C^si = 



4.0 x 10 3 . Three cycles of chromatography are required 
to isolate LC7 , so the first approximation to C eff is 
740 ( = exp( lcg e (4.0 x l0*)/3 ) ). 




I. 
3i 



215 

We now determine the efficiency of the affinity 
separation (Sec. 10.2). This is done by: a) preparing 
mixtures of LG7 and LC11 in- the ratio 1:Q, b) enriching 
the population for LC7 for one separation cycle, and c) 
5 determining the fraction of LC7 in the last' phage- 
bearing fraction. The phage are obtained from cultures 
i.nduced at 10.0 uM I PTG , the optinal level. Q is 
decreased until roughly half the phage arc LP7 . We 
start with Q = 1.5 x 10 4 - 20 x approximate C eff . The 

10 mixture is applied to a ( AHTrp ) column bearing 4.0 rag 
AHTrp on 1.0 nil of Affi-Cel 10 (the optimal OoAMoM) and 
cluted in the specified manner. A sample of 4 ul from 
each fraction is plated at suitable dilution on phage 
sensitive cells on LB agar. The identity of colonies 

15 in the last phage-bcar ing fraction is determined by 
transferring colonies to ampicillin-containing and 
tetracycline-containing plates; colonies that snow Tet R 
are from LC11 and colonies that shew Ar.p-^ are fron LCI. 
when Q is 1.5 x 10 4 . . j>. of colonics are 3PTI positive. 

20 When Q is 1.5 x 10 3 , 60% of the colonies are BPTI 
pesitive. Thus we calculate C e ff = -60 x 1.5 z 10 3 = 
000. 

Myoglobin ■ is strongly colored and it is possible 
25 .that binding of HHI'.b to M13 could provide enough 
optical absorption to allow FACS sorting ct M13 that 
bind HHMb (See Sec. 10.4). 

We have new constructed U37 that displays one or 
30 more BPTI domains on ench virion. The oso-lpbd gene is 
under control of the l*r:'.:V5 promoter so that expression 
levels of DFT:-!-:i3 CP can be. manipulated via J1PTG). 
This construct may be used to develop many different 
binding proteins, all based on BPTI. An optimum level 
35 of induction has been determined. An optimum amount of 
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Af M ( P9D) » DoA^oM C[)timUiTl - 2-0 mg/(ml of support; has 
been determined; target molecules will be applied to 
columns at this level in the process disclosed in Sec. 
15.1. These optimum levels may be adequate for all 
5 targets and all va r iegat ions of BETI displv/ed on 
derivatives of Ml 3 based on LCI. but sone Curthtr 
optimization may be needed if other values of pH or 
temperatures are used. 

10 Other pb'4 gene fragments may be substituted for 

the bot i gene fragment in pLC7 with a high livelihood 
th*t PBD will appear on the surface of the new LCI 
derivative . 

15 

HHMh is chosen as a typical protein tar-jet; any 
other protein could be used. HKMb satisfies all of the 
criteria for -i target: 1) it is large enough to be 
20 applied to an affinity matrix, 2) after attachment it 
i*s not reactive, and 3) after attachment there is 
sufficient unaltered surface to allow specific binding 
by PBOs. 

25 - The essential information for HHMb is known: 1) 
HHMb is stable at least up to 70°C, Letveen pM . < and 
9.3, 2) HHMb is stable up to 1.6 M Cuan id in i urn CI, 3) 
the pr of HHMb is 7.0, 4 ) for HHMb, M r = l'>,000, 5) 
HHMb requires haom, 6) HHMb has no proteolytic 

3 0 activity. 

In addition, the foi loving information about HHMb 
and other myoglobins is available: 1) the sequence of 
HHMb is known, 2) the *JD structure of spcr.n whale 
35 myoglobin is Known; HHMb has 19 amino acid differences 



v.: 



Vr.-.>4,. ! ,'...,.T' 



217 

and it is generally assumed that the 3D structures are 
almost identical, 3) HHMb has no enzymatic activity, 4) 
HHHb is not toxic. 
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We set the specifications of an SBD as 

1) T = 25°C 

2) pH = 8.0 

3) Acceptable solutes : 

A ) for binding : 

i) phosphate, as buffer, 

ii) KC1, 10 mM, 

B ) for column elution : 

i) phosphate, as buffer, 0 to 30 nvM, 

ii) KC1, up to 5 M, and 

iii) Guanidinium Cl, up to 0.8 M. 



0 to 2 0 mM, and 



-8 



M. 



35 



4) Acceptable K d < 1.0 x 10 
We choose LG7 as G»(irSD). 

As stated in Sec. 13. 't. the residues to be varied 
are picked, in part, through the use of interactive 
computer graphics to visualize the structures. In this 
-section, all residue numbers refer to BPTI. We pick a 
set of residues that forms a surface such that all 
residues can contact one target molecule. Information 
that we refer to during the process of choosing 
residues to vary includes: 1) the 3D structure of BPTI, 
2) solvent accessibility of each residue as computed by 
the method of Lee and Richards (LEEB71), 3) a 
compilation of sequences of other proteins homologous 
to BPTI , and 4) knowledge of the structural nature of 




different aziino acid types. 



Tables 16 and 34 indicate which residues of B?TI : 
a) have substantial surface exposure, and b) are known 

5 to tolerate other amino acids in other closely related 
proteins. We use interactive computer graphics to pick 
sets of eight to twenty residues that are exposed .and 
variable and such that all r.erbers of one set can touch 
a molecule of the target material at one ti.-:e. If BPTI 

0 has a snail amino acid at a given residue, that amino 
acid may not be able to contact the target 
simultaneously with all the other residues in the 
interaction set, but a larger Aaino acid night well 
z-.ake contact. A charged ar.ino acid night affect 

5 binding without making direct contact. In such cases, 
the residue should be included in the interaction set, 
with a notation that larger residues might be useful. 
In a similar way , large acino ac ids near the geometric 
center of the interaction set r.ay prevent residues on 

0 either side of the large central residue frcra making 
simultaneous contact. If a sr.all amino acid, however, 
were substituted for the large araino acid, then the 
surface would become flatter ar.d residues on either 
side cculd make simultaneous contact. Such a residue 

5 should be included in the interaction set with a 
notation that small amino acids zay be useful. 

Table 35 was prepared frcn standard =odol parts 
and shows the maxi:::um span botveen C bcCd ar.d the tip of 
0 each type of side group.- C bcta is used because it is 
rigidly attached to the protein main-chain; rotation 
about the C al?na -C bcta bond is the most important 
degree of freedom fcr determining the location of the 
side group. 
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Table 3- indicates five surr'aces that -eet the 
given criteria. The first surface comprises tne set of 
residues that actually contacts trypsin in the complex 
of trypsin vith BPTI as reported in the Brookhaven 
Fror.ein Data Bank entry "1TPA" . This set is indicated 
by the number "1". The ex-posed surfar.e of - the residues 
in this set (taken fron Table 16) totals 114B A 2 . 
Although this is not strictly the area of contact 
between BPn and trypsin, it is appropriately tne same. 

Other surfaces, numbered 2 to 5 , were picked by 
first picking one exposed, variable residue and then 
picking neighooring residues until a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique; other sets of 
variable , surface res idues can be p icked . Sot * 2 is 
shown in stereo view. Figure 10, including the alpha 
carbons of BFTt, the disulfide linkages, and the side 
groups of the set. We take the orientation of BPTI in 
Figure 10 as a standard orientation, and hereinafter 
refer to K15 as being at the top of the nolecuie, while 
the carboxy and amino ter~ini are at the bottcn. 
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Solvent accessibilities are useful, easily 
tabulated indicators of a residue's exposure. Solvent 
accessibilities nust be used with scr.e caution; snail 
anino acids are under-represented and large a.-.ino acids 
over-represented. The user oust consider vhat the 
solvent accessibility cf a different amino acid would 
be when substituted into the structure of BPTI. 



To create specific binding between a derivative of 
BPTT and HHXb, we will vary the residues in set ?2.- 
This set includes the twelve principal residues 17 (R) , 
19(1), 21(Y). 27(A), 2S(C), 29(L), 31<Q), 32(7), 34(V), 




220 



25 Residue 19 is also variable and fully exposed, 

containing P, R. I, S, K, Q, and L. 



48(A), 49(E), and 52(H) (5ec. 13.1.1). None of the 

residues in sec ;2 is. conpletcly conserved in the \ ' ■ V ,] 



m 



sample of sequences reported in Table 34; thus ve can [" 

vary then with a high probability of retaining the fe-.j'S 

5 under lying structure. Independent substitution at each t;->-*T' 
of these twelve residues of the amino acid types 
observed at that residue would produce appcox inateiy >>J 

4.4 x 10* 5 aaino acid sequences and the same number of >; . 

surfaces. [ 

io 

DPTI is a very basic protein. This property has 
been used in isolating and purifying DPTI and its . ■ 
homologues so that the high frequency of arginine and |".:'.vicg 
lysine residues nay reflect bias in isolation and is . 
15 not necessarily required by the structure.- rndeed, 

SCI-III fron Dor.byx nori contains seven more acidic f , : \ "i 

than basic groups ( SASA84 ) . 

Residue 17 is highly variable and fully exposed f\*V 
20 and can contain P., K, A, Y. H, F. L, M, T, G, Y, P, or £-;.V?g 



S. All types of amino acids are seen: large, small, j^:... 
charged, neutral, and hydrophooic. That no acid 
croups are observed nay be due to bias in the sar.ple. 



Residue 2 1 is not very variable, containing F or Y j' 
in 31 of 33 cases and I and w in the remaining cases. 
30 The side group of Y2 1 fills the space between T32 ar.d 

the main chain of residues 4? and 48. The OH at the ^ 
tip of the Y side group projects into the solvent, p.v; : ;* 
Clearly one can vnrv the surface by substituting Y or F f '•-.;4 

J ' L . S.- - 

so that the surface is either hydrophobic or t'v^ 
35 hydrophilic in that region. It is also possible that 
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th. other aro,atic «ino acid H) or the other 

hydrophobics (L. M. or V, nigh, be tolerated. 

Rcsid uc 27 nost often contains A. but S. K L and 
T are also observed. On structural ground*, this 
residue viU probably tolerate any hvdrophilic anino 
acid and perhaps any anino acid. 

;„ norr This residue is in a 
Residue 28 is C in DPT I. 
turn, but is not in a conformation peculiar to glycin • 
Six other types of anino acids have been observed at 
" s resioue K, „. Q, R. H. and „. Snail side groups 
t Us residue „ight not contact HHMb simultaneous 

th residues 17 and „. ^ side ^caM 
in tcract with HHMb at the sane tine as res,. «. 
3, Charged side groups at this residue could ,f. « 
Ending of HHHh. on the surface defined by ^e ot 

i To.r «nv amino acid, except 
residues cf the principal set. ~ny 

perhaps P, should be tolerated. 

■>q i- hiqhly variable, most often 
Residue 29 l- nigniy 

containing L, This fully exposed po.itio 

probably tolerate alnost any am.no acid except, 

perhaps, P. 

n -12 aid 34 are highly variable, 
Residues 31, 32, a. .a 

■ . ..nf^r-Af ions; any amino acid 

exposed, and in extended ccn fo nations y 

should be tolerated. 

Residues ,8 and <9 arc also highly variable and 
fully exposed, any anino acid should be tolerated. 

Residue 52 is in an alpha helix. Any anino acid, 
except perhaps P, night be tolerated. 
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Now we consider possible variation of the 
secondary set (Sec. 13.1.2) of residues that are in the 
neighborhood of the principal set. Neighboring 
residues that* night be varied at later stages include 
9(P), 11(T), 15(K), 16(A), 13(1), 20(R), 22(F), 24(N), 
26{K), 35(Y), 47(S), 50(0), and 53(R). 

Residue 9 is highly variable; extended, and 
exposed. Residue 9 and residues 43 and 49 are 
separated by a bulge caused by the ascending chain from 
residue 31 to 34. For residue 9 and residues 48 and 49 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain from 31 
to 34 can fit, or all three residues (9, 48, and 49) 
oust have large amino acids that effectively reduce the 
radius, of curvature . o f the 3PTI derivative. 

Residue 11 is highly variable, extended, and 
Residue 11, like residue 9, is slightly far 
froc the surface defined by the principal residues and 
will contribute to binding in the sane circumstances. 

Residue 15 is highly varied. The side group of 
residue- 15 points away form the face defined by set 42. 
Changes of charge' at residue 15 could affect binding on 
the surface defined by residue set ;2. 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in 
charge at this residue could affect binding on the face 
defined by set *2. 

Residue IS is r in B?TI. This residue is in an 
extended conformation and is exposed. Five other amino 
acids have been observed at. this residue: M, F, L, V, 
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and T. Only T is hydrophilic. The side group points 
directly away from the surface defined by residue set 
S2. Substitution of charged amino acids at this 
residue could affect binding at surface defined by 
res idue set 3 2. 

Residue 20 is R in BPTT. This residue is in an 
extended confomacion and is exposed. Four other a .-si no 
acids have been observed at this residue: A, S, L, and 
Q. The side group points directly away froa the 
surface defined by residue set ;2. Alteration of the 
charge at this residue could affect binding at surface 
defined by residue" set jJ2. 

Residue 22 is only slightly varied, being Y, F, or 
H in 30 of 33 cases. Nevertheless, A,. N, and S have 
been : observed at this residue. Amino acids such as L, 
M, I, or Q could be tried here. Alterations at residue 
22 may affect the .nobility of residue 21; changes in 
charge at residue ' 22 could affect binding at the 
surface defined by .residue set *2. 



but probably can 
of the target 



Residue 24 shews sorr.e variation, 
not interact with one r.olecule 
simultaneously with all the residues in the principal 
set. Variation in charge at this residue might have an 
effect on binding at the surface defined by the 
principal set. 



30 Residue 26 is highly varied and exposed. Changes 

in charge nay affect binding at the surface defined by 
residue set 32; substitutions nay affect the mobility 
of residue 27 that is in the principal set. 
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Residue 35 is most often Y, W has been observed. 
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The side group of 35 is buried, but substitution of F 
or W could affect the mobility of residue 34. 

Residue 47 is always T or S in the sequence .sample 
5 used. The 0g anra probably accepts a hydrogen bond fron 
the NH of residue 50 in the alpha helix. Nevertheless, 
there is no ovorvhc Ini ng steric reason to preclude 
other amino acid types at this residue. In particular, 
other amino acids the side groups of which can accept 
10 hydrogen bonds, viz. N, D, Q, and E, may be acceptable 
here . 

Residue 50 is often -an acidic amino acid, but 
other amino acids are possible. 



15 



2C 



Residue 53 is often P., but other amino acids have 
been observed at this residue. Changes of charge may 
affect binding to the amino acids in interaction set 
32. 



Stereo Figure 10 shows the residues in set i2, 
plus R39. From Figure 10, one can see that R3? is on 
the opposite side of DPT I fora the surface defined by 
the residues in set -2. Therefore, variation at 
25 residue 39 at the same time as variation of some 
.residues in set il is r.uch less likely to improve 
binding that occurs along surface H2 than is variation 
of the other residues in set $2. 



30 In addition to the twelve principal residues ar.d 

13 secondary residues, there are two other residues, 
30(CJ and 33(F), involved in surface ?2 that we will 
probably not vary, at least not until late in the 
procedure. These residues have their side groups 

35 buried inside DPTI and are conserved. Changing these 
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residues does not change the surface nearly so much as 
does changing residues in the principal set. These 
buried, conserved residues do, however, contribute to 
the surface area" of surface <2. The surface of residue 
set *2 is comparable to the area of the" t rypsin-b ind ing 
surface. Principal residues 17, 19, 21, '27, 23, 29, 
31, 32, 34, 48, 49, and 52 have a combined solvent- 



accessible area of 046.9 A^. 
15 , 16, 18, 20, 22, 24 , 26 , 
combined surface of 104 1.7 A 2 
exposed surface totaling 33.2 A 2 , 
groups' conbined surface is 2026.8 A 2 . 



Secondary residues 9, 11, 
35, 4 7, 50, and 5 3 have 
Residues 30 and 33 have 



Thus the three 



Residue 30 is C in 8PTI and is conserved in all 
homologous sequences. It should be noted, however, 
that C14/C33 is conserved in all natural sequences, yet 
Harks et a 1 . ( MARK3 7 ) showed that changing both C14 and 
C33 to A, A or T , T yields a functional trypsin 
inhibitor. Thus it is possible- that '3PTI-li>:o 
nolecuJ.es will fold if C30 is replaced. 

Residue 33 is r in, BPTI and in all homologous 
sequences.' Visual inspection of the D?TI structure 
suggests that substitution of V, M, H, or L might be 
tolerated. 

Having identified twenty residues that define a 
possible binding surface, wc must choose some to vary 
first. Given our hypothetical affinity separation 
sensitivity, C S Qnsi> we decide to vary six residues 
leaving some margin for errors in the actual base 
composition cf variegated bases. To obtain maximal 
recognition, we choose residues from the principal set 
that arc as far apart as possible. Table 36 shows the 
distances between the beta carbons of residues in the 
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principal and peripheral set. R17 and V34 are at one 
end of the principal surface. Residues A27, G28, L29, 
A48, E49, and M52 are at the other end, about twenty 
Angstross avay; of these, we will vary residues 17, 27, 
5 29, 34, and 48. Residues 28, 49, and 52 will be. varied 
at later rounds. 

Of the remaining principal residues, 21 is left to 
later variations. Among residues 19, 3 1,. and 32, we 
10 arbitrarily pick 19 to vary. 

Unlimited variation of six residues produces 6.4 x 
10 7 amino acid sequences. 3y hypothesis, C 3ens [ is 1 
in 4 x 10 3 . Table 37 shows the progran-.ned variegation 

15 at the chosen residues. The parental sequence is 
present as 1 part in 5 . 5 x 10 7 , but the least favored 
sequences are present at only 1 part in 4.2 x iO 9 . 
Anong s ingle-anino-acid substitutions frcr the PPBD, 
the least favored is F17-I 19-A27-L29-V34 -A43 and has a 

20 calculated abundance of 1 part in 1.6 x 10 3 . Using the 
optimal qf>: codon, we can recover the parental sequence 
and all one-amino-acid substitutions to the PPBD it 
actual nt compositions come within S\ of programed 
compositions. The number of trans formants is M ntv = 

25 1.0 x 10 9 (also by hypothesis), thus ve will produce 
dost of the programmed sequences. 

The residue numbers of the preceding section are 
referred to nature BPTI (R1-P2 - . . . -A53 ) . Table 25 has 

30 residue numbers referring to the pre-M 1 3CP-3PTI 
protein; all mature DPTI sequence numbers have teen 
increased by the length of the signal sequence, L_e.. . 
23. Thus in terms of the prc-0SP-P3D resiaue numbers, 
we wish to vary residues 40, 42, 50, 52, 57, and 71. A 

35 DUk subsequence containing all these codons is found 
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between the ( Apa I/ Dra Il/ Psg I) sites at base 191 and 
the Sph I site at base 209 of the osp-nb d gene. Among 
Apa I, Dra I, and Pss I, An* I is preferred because i\. 
recognizes six bases without any ar±>iguity. Pra IT and 
5 Pss I, on the other hand, recognize six bases with two- 
fold ambiguity at two of the bases. The vgD.^A will 
contain more Dra II and Pss I recognition sites at the 
varied locations than it will contain Apa I recognition 
sites. The unwanted extraneous cutting of the vgDNA by 

10 Apa I and Sen I will eliminate a few sequences from our 
population. This is a ninor problem, but by using the 
more specific enzyme f Ana I), we minimize the unwanted 
effects. The sequence shown in Table 37 illustrates an 
additional way in which gratuitous restriction sites 

15 can be avoided in some cases. The osp- i pbd gene had 
the ccdon CCC for g51; because we are varying both 
residue 50 and 52, it is possible to obtain an Aoa I 
site. If we change the glycine codon to CCT, the Apa I 
site can no longer arise. An-i I recognizes the DN'A 

20 sequence (GCCCC/C) . 

Each piece of dsDilA. to be synthesized needs six to 
eight bases added at either end to allow cutting with 
restriction enzynes and is shown in Table 37. The 
25 first synthetic base (before cutting with Apa I and Sph 
I) is 184 and the last is 322. There are 142 bases to 
be synthesized. The center of the piece to the 
synthesized lies between QSA and V57. The overlap can 
not include varied bases, so we choose bases 245 to 256 
as the overlap that is 12 bases long. Note that the 
codon for F56 has been changed to TTC to increase the 
CC content of the overlap. The amino acids thaz are 
being varied are narked as X with a plus over them. 
Codons 57 and 71 arc synthesized on the sense (bottom) 
strand. The design calls for "qfh" in the antisense 
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strand, so that the sense strand contains (fron 5' to 
3 ' J a) equal part C and A ( i.e. the conplcr.ont of k) , 
b) (0.40 T, 0.22 A, 0.22 C, and 0.16 C) ( i.e. the 
conplement of f ) , and cj (0.26 T, 0.26 A, 0.30 C, and 
0.18 C). 



Each residue 
possible outcomes, 



that is encoded by 'iqffc" has 2L 
each of the araino acids plus stop. 
Table 12 gives the distribution of amino acids encoded 
10 by "qfk", assuming 5t errors. The abundance of the 
parental sequence is the product of the abundances of R 
x r x A x L x V :< A. The abundance of the Least- 
favored sequence is 1 in -1.2 x 10^. 




15 01ig*27 and olig^23 are annealed and extended with 

Klenow fragment and all four (nt)TPs. Octh the ds 
synti.etic OKA and P.F pLG7 0NA are cut with both Ana I 
and Soh I. The cut D:.'A is purified and the appropriate 
pieces ligated (Sea Sec. 14.1) and used to transform 

20 competent PE333. (Sec. 1^.2). In order to generate a 
sufficient nunfcer of t rans f oman ts , V c is set to 50C0 
d1. 



I) culture co 1 i in 5.0 1 of L3 broth at 3/ 
25 * until cell density reaches 5 x 10 7 to 7 x 10 1 
cells/ral. 



3 0 



2) chill on ice for 65 ninutcs, centrifuge the 
cell suspension at <000g for 5 minutes at 4 c c, 

3) discard supernatant; resucpend the cells in 
1667 ml of an ice-ccLd, sterile solution of 60 
nM CaCl 2 , 



»3 



35 



4) chili on icj for 15 ninutes, and then 
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centrifuge at -JOOOg for 5 minutes at «t°C, 

5) discard supernatant; resuspend cells in 2 x 
400 ral of ice-cold, sterile 60 mM CaCi 2 ; store 
cells at 4°C for 24 hours, 

6) add DNA in ligation or TE buffer; nix and 
store on ice for to ninutes; 20 nil of solution 
containing 5 ug/.-nl of DNA is used, 

7) heat shock cells at 42°C for 90 seconds, 

3) add 200 ml LB broth and incubate at 37°C for 
1 hour, 

9) add the culture to 2.0 1 of L3 brcth 
containing anpicillln at 35-100 ug/ml and 
culture for 2 hours at 37°c. 

10) centrifuge at 8000 g for 20 minutes at 4°C, 

11) discard supernatant, resuspend cells in 50 
ml of LB broth -.jIjs nnpici.il in and incubate I 
hour at 37°C, 

12) plate cells cn L~>. agar containing 
anpicillin, 



30 



13) harvest virions by method of Salivar et nl. 
(SALI64). 




The heat shock of step (7) can be done by dividing the 
200 si into 100 200 ul aliquots in 1.5 ml plastic 
35 Epper.dorf tubes. It is possible to optimize che heat 
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shock for other volumes and kinds cf container. It is 
important to: a) use all or nearly all the vgDJJA 
synthesized in ligation, this will require large 
amounts of pLG7 backbone, b) use all or nearly all the 
5 ligation mixture to transform cells, and c) culture all 
or nearly all the transf ornants at high density. These 
measures are directed at maintaining diversity/ 

I PTC is added to the growth medium at 2.0 m.M (the 
10 optimal leveJ) and virions arc harvested in the usual 
way (Sec. 14.3). It is important to collect virions in 
a way that samples all or nearly all the trans formants . 
Because T~ cells are used in the transformation, 
multiple infections do not pose a problem. 

15 

HHMb has a pi of 7.0 and we carry out 
chromatography at pH 3.0 so that HHMb is slightly 
negative while BPTI and most of its mutants are 
positive. HHMb is fixed (Sec. 15.1) to a 2 . 0 ml column 
20 on Affi-Cel io<™> or Affi-Gel IS*™ 1 at 4.0 mg/ml 
support matrix, the same density that is optimal for a 
column supporting trp. 

We note that charge repulsion between B^TI and 
25 HHMb should not be a serious problem and does not 
impose any constraints cn iens or solutes allowed as 
eluants. Neither BPTT nor HHMb have special 
requirements that constrain choice of eluants. The 
eluant or choice is KC1 in varying concentrations. 

30 

To remove variants of BPTI with strong, 
indiscriminate binding for any protein cr f^r the 
support matrix (Sec. 15.2), uc pass the variegated 
population of virions over a column that supports 
35 bovine serum albumin (3SA) before loading the 
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population on:o the (HHI'.b) column. Affi-Cel 10 <™> or 
Affi-Cel 1 5 C ) i s usec | to immobilize 3SA at the. 
highest level the matrix will support. A 10.0 ml 
column is loaded with 5.0 ml of Af f i -Ce 1- 1 inxed-BSA; 
this column, called ( BSA ) , has V v = 5.0 ml. The 
variegated population of virions containing 10 12 pfu in 
1 nl (0.2 x V v ) of 10 r-M KCl, 1 »rt phosphate, pH 6.C 
buffer is applied to ( ESA } . v;e wash ( 3S A ) with 4.5 nL 
(0.9 x V v ) of 50 nM KCl, 1 mM phospha te , • pH 0.0 buffer. 
The wash with 50 mM salt will elute virions that adhere 
slightly to BSA but not virions with strong binding. 
The pooled effluent of the (BSA) column is 5.5 nl of 
approximately 13 nM KCl. 

The colu.nn {HHI'.b) is first blocked by treatment 
with 10 11 virions of M13(an429) in 100 ul of 10 nM KCl 
buffered to pH 3.0 with phosphate; the column is washed 
with the same buffer until OD 2 n0 returns to base line 
or 2 x v v have passed through the .column, whichever 
comes first. The pooled effluent from ( 3SA ) is added 
to (KKMbi in 5.5 nl of 13 irJ-t KCl, 1 mil phosphate, pH 
3.0 buffer. The column is eluted (Sec- 15.3) in the 
following way: 

1) 10 r.M KCl buffered to pH'8.0 with phosphate, 
until optical density -at 2S0nn falls to base lino 
or 2 x V v , whichever is first, (effluent 
discarded) , 

2) a gradient of 10 rdi to 2 M KCl in 3 x V v , pH 
held a: 3.0 with phosphate. (30 x 100 ul 
fractions ) , 




35 



3) a gradient cf 2 H to 5 M KCl in 3 x 



phosphate buffer to 



pH 3.0 (30 



1C0 ul 
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fractions) , 

4) constant 5 M KC1 plus 0 to 0.3 H guanidinium CI 
in 2 x V V , with phosphate buffer to pH 8.0, (20 x 
100 ul fractions) , and 

5) constant 5 M KCl plus 0.8 M guanidiniura CI in 1 
x v v . with phosphate buffer to P H CO. (10 x 100 
ul fractions) . 

in addition to the eU.tion fractions, a sample i. 
removed froo the colunn and used as an inoculum for 
phage-sensitive Sup" eel's (Sec. 15.4). A sar.ple of 4 
ul fro= each fraction is plated on phage-sensitive Sup 

cells. Fractions that yield too nany colonies to 
count are replated at lower dilution. An approximate 
citre of each fraction is calculated. Starting with 
the last fraction and working toward the first fraction 
that was titered. we pool fractions until approximately 
10' phage, are in the pool, about 1 part in 1^0 0 of 

the phage apolied to the column. This population is 
infected into 3 x ic" phage-sensitive PE334 in 300 D l 
of L3 broth. The .ery low multiplicity of infection 
(coi) is chosen to reduce the possibility of nultiple 
infection. After thirty ninutes. viaole phage have 
entered recipient cells but have not yet begun to 
produce new phage. Phage-born genes are expressed at 
this phase, and we can add ar.picillin that will kill 
uninfected cells. These cells still carry F-pili ar.d 
will absorb phage helping to prevent nultiple 

infections. 

If multiple infection should pose a problen that 
cannot be solved by growth at low C ultiple-o£-infection 
on F- cells, the following procedure car. be employed to 
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obviate the problcra. Virions obtained froo the 
affinity separation are infected into F* - Ej. col i and 
cultured to anp.lify the genetic nessages (Sec. 15.5). 
CCC DNA is obtained either by harvesting RF DNA or by 
in vitro extension of primers annealed to ss phage DNA. 
The CCC DNA is used to transform F" cells at a high 
ratio of cells to UNA. Individual virions obtained in 
this way should bear only proteins encoded by the Ot.'A 
within. 

The variegation produced as nany as 6.4 x 10 7 
different amino-acid sequences. C e ff is 900 - Thus, 
after two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
cycles, the probability rises above 0.10. 

The phageraid population is grown and 
chromatographed three times and then examined for 5BDs 
(Sec. 15.7). In each separation cycle, phage from the 
last three fractions that contain viabLe phage are 
pooled with phage obtained by removing some of* the 
support matrix as an inoculum. At each cycle, about 
10 12 phage are loaded onto the colunn and about 10 9 
phage are cultured for the next separation cycle. 
After the third separation cycle, 32 colonies are 
picked from the last fraction that contained viable 
phage; phage fron these colonies are denoted SBD1, 
SBD2, . . . , and SBD32. 

Each of the SBDs is cultured and tested for 
retention on a Pep-Tie column support ir.g HHXb (Sec. 
15.8). Fhage LG7(SUD11) shews the greatest retention 
on the Pop-Tie { HHtfb | column, e luting at 3 67 r-M KCi 
while wtM13 elutes at 20 KCI. S3DU becomes the 

parental anino-acid sequence to the second variegation 
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cycle. 

The result of this hypothetical experiment is 
shown in Table 38. RIO changed to 0, T42 changed to 
5 Q, A50 changed to E, L52 remained L, and A7i changed to 
W. 

The next round of variegation (Sec. 16) is 
illustrated in Table 39. The residues to be varied arc 

10 chosen by: a) choosing some of the residues in the 
principal set that were not varied in the first round 
( viz . residues 42, 44, SI, 54 , 55, 72, or 75 of the 
fusion), and b) choosing some residues in the secondary 
set. Residues 51, 54, 55, and 72 are varied through 

15 all twenty amino acids and, unavoidably, step. Residue 
44 is only varied between. Y and F. Sone residues in 
the secondary set are varied through a restricted 
ranee; primarily to allow different charges (-»-, 0, -) 
to appear. Residue 38 is varied through K, R, E, or G. 

20 Residue 41 is varied through I, V, K, or E. Residue 43 
is varied through R, S, C, U, K, 0, E, T, or A. 

Olig329 and olig.?30 are synthesized, annealed, 
extended and cloned into pLo7 at the Apa I/ Soh I sites. 

25 The ligation mixture is used to transform 5 1 of 
•competent PE383 cells so that 10 9 trans formants are 
obtained. A new (HKMb) is constructed using the same 
support matrix as was used in round 1. A sample of 
10 12 of the harvested LG7 are applied to { HHMb ) and 

30 affinity separated. The last in 9 phage off the column 
and an inoculum are pooled and ^ tured. The cultured 
phagemids are re-chroma togrophed for three separation 
cycles. Thirty-two clonal isolates (denoted SB011-1, . 
SDD11-2 SBD11-32) are obtained from the effluent 

35 of the third separation cycle -".nd tested for oinding on 
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a Pep-Tie {HMy.b; column. Of this set, SBDll-2 3 shovs 

the greatest retention on the Pop-Tie { HHMb } column, 
e luting ac 69 2 rr*M KCl. 

5 The results of this hypothetical selection is 

shovn in Table 40. Residue 33 (K15 of BPTI) changed to 

Z , 4 1 becones V, 43 goes to N, 44 goes to F, 51 goes to 

r, 54 qocs to S, 55 gees to A , and 72 goes to Q. 

10 The sbd 1 1 -2 3 portion of the osp-obd gene is cloned 

into and expression vector and BPTI ( El 5 , D17, V18, Q19, 
N20, F21, £27, F28, L29, S31,. A32, S34, W7 1 , Q7 2 ) is 
expressed in the periplasm. This protein is isolated 
by standard methods and its binding to Ki-Mb is tesred. 



is found to be 4 . 5 x 10 



-7 



A third round of variation, using SBOll-23 as 
PP3D, is illustrated in Table 4 1; eight anino acids are 
varied. Those in the principal set, residues 40, 55, 
and 57, arc vnried through all tventy anino acids. 
Residue 12 is varied through P, Q, T , K, A, or Z. 
Residue 34 .is varied through T, P, 0. K. A, or E. 
Residue 44 is varied through F, L, Y , C, w, or stop. 
Residue 50 is varied throuqh Z , K, or Q. Residue 52 is 
varied through L, F, I, M , or V. 

The result or" this variation is shovn in Table 42. 
The selected SBO is denoted SBOll-23-5 and elutes frcn 
a Pep-Tie (HHMb} column at 98 0 nM KCl. The sbdll-2:-5 
segnent is cloned into an expression vector and 
BPTI ( £5 , Qll, £15, A 17, Via, Q19, .\ T 2 0, W21, Q2 7, X2S, 
M29, S31, L3 2. K3<1, W7 1 , Q7 2) is produced. This tine 
the K.j is 7.3 x 10"' J M. 

This exar.ple is hypothetical. It is anticipated 
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that more var legation cycles -ill be needed to achieve 
dissociation constants ot 10" 3 M. It is also possible 
that core than three separation cycles will be needed 
in some variegation cycLes. Real DMA cheraistry and DMA 
synthesizers nay have larger errors than our 
hypothetical St. If S err > 0.G5, then we nay not be 
able to vary six residues at once. Variation" of 5 
residues at c.nce is certainly possibLe. 
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Tabic I : Single-letter codes. 



Sinale-lottor corf? :s used for proteins 



ALA 
= CLY 
= MET 
= SER 
= STOP 



CVS 
HIS 
ASM 
THR 

any a, 



d = ASP 

i - 1 L£ 

p = PKO 

v - V A L 

:ir.o acid 



e =* CLU 

k = LYS 

q = CLN 

w = TRP 



f = PHE 
1 = LEU 
r « ARC 
y ^ TYR 



b = n or d 
2 = e or q 
x = any amino acid 



Single- Lett or P.;n codos for P'fA 

T, C, A, C stand for themselves 

M .for A or C 

R for puRines A or C 

W for A or T 

S for C or G 

Y for pYri.^idincs T c r C 
K for C or T 

V for A, C, or G 
H for A, C, or T 
D for A, C, or T 
B for C, G, or T 



{not T) 
(not C) 
(not C) 
(not A) 



N for any base. 
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Table Z; Preferred Outer-Surface Proteins 
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Preferred 






Jenetic 


Outer-Surface 






Packaqe 


Prote in 




Reason for preference 


H13 


coat protein 


a) 


exposed amino terminus, 




(gpvirij 


b) 


predictable post- 








translat ional 








processing. 






c) 


numerous copies in 








virion. 




crn TIT 


a) 


fusion data available. 


PhiXl74 


C protein 


a) 


known to be on virion 








exterior. 






b) 


snail enough that 








the C-inbd oene can 








roolace K oene. 


Li coll 


LamB 


a) 


fusion data available. 






b) 


non-essent ia 1 . 


B. subtilfc 


CotC 


a) 


no post-translational 


spores 






processing. 






b) 


distinctive sdequence 








that causes protein to 








localize in spore coat, 






C) 


non-osr.ent ia I . 




CotO 


Sano os for CotC. 
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Tabl* 3: Ambiguous DMA for AA_seq2 



a 
1 

A.T.G 



a 
9 

G.C.n 



17 
G.T. n 



k 
2 

A. A. r 



10 
T.C. n 
A. G.y 



P 
18 
C.C. n 



P I d 
25 26 
C . C . n | C . A . y 





k 




s 




1 




V 




1 




3 


i 


4 




5 




6 




7 


A 


.A.r 


T 


C.n 


T 


,T.r 


G 


T . n 


T 


T. r 






A 


G.y 


C 


.T.n 






C 


T.n 




V 




a 




V 




a 




t 




11 




12 




13 




14 




15 


C 


.T.n 


G 


C.n 


G 


T.n 


G 


C.n 


A 


C.n 




n 




1 




s 




f 




b, 




19 




2C 




21 




22 




23 


A 


. T . G 


T. 


T. r 


T, 


C.n 


T 


T. y 


G 


C.n 






C. 


T.n 


A. 


G.y 












f 




c 




1 




e 




P 




27 




28 




29 




30 




31 


T 


.T. y 


T. 


G.y 


T. 


T.r 


G. 


A.r 


C. 


C.n 










C. 


T.n 











y 

33 
T.A.y 



L 

4 1 
A. T.n 



-I O 

| A. A. r 



! ! 



t g 

34 I 35 
A . C . n | G . G . n 



l 

. 42 
A . T . h 



a 

50 
C.C.n 



r 
43 
C.G.n 



P 

36 
C.C.n 



y 

44 
T.A.y 



c I k 

37 38 
T.G.yi A. A. r 



k 
8 

A. A. r 



1 

16 
T.T.r 
C.T.n 



r ! 

24 
C.C.n 
A.C. r 

? I 
32 , 

CC.nl 



a r 
39 I 40 | 
C.C.n C.C.n! 
|A.G.rj 



g I 1 



51 
C.C.n 



52 
T.T.r 
C.T.n 



f 

45 

T.T. y 



c 
53 
T.G.y 



y 

46 
T.A.y 



n 

47 
A . A . y 



«« i 

G . C . n i 



q j t I f I 
54 I 55 56 
C.A.rj A.C.n|T.T.yj 



y ! 


<5 


«3 


c 


r 




a 


k 


53 ; 


59 


60 


61 


62 




63 


64 


A.yjc 


C.n 


G .G . n 


T.G.y 


C.C.n 
A . G . r 


G 


C.n 


A. A. r 



E'.v 

S*-; : v 



yr- 

fe 



r. : .. 
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r 

65 
C.C.n 
A.C. r 



d 
73 
C.A.y 



a 

81 
C.C.n 



k 

89 
A . A. r 



n . 
66 
A.A.y 



c 

74 

T.G.y 



82 
C.C.n 



a 

90 
C.C.n 



n 

67 
A.A.y 



75 
A.T.C 



e 

83 
C. A. r 



f 

68 
T.T. y 



r 

76 
C.C.n 



3 
84 
C.C.n 



k 

69 
A. A. r 



s 

70 
T.C.n 
A.C.y 



a 

71 
C.C.n 



c 

72 
C.A. r 



t C q g 

77 78 j- 79 j HO 

A.C.n|T.C.ylC.C.n;C.C.n 



d 

85 
C.A.y 



a f n 

91 I 92 I 9 3 
C.C.n|T.T.y| A. A. y 



d 

86 
C.A.y 



94 
T.C.n 
A.G.y 



37 88 | 
C.C. n|C. C. n| 



1 

95 
T.T. r 
C.T.n 



q 

96 
C.A. : 



a 

97 
C.C.n 



s ■ a 
. 98 j 99 
T.C. niCC. n 
A.C.yi 



t 
100 
A.C. n 



101 
C.A. r 



102 j 103 j 104 j 
:.A.y|A.T.h|G.C.nj 



35 


y 

105 
T. a. y 


a 
106 
C.C.n 


V 

107 
T.C.C 


a 
108 
C.C.n 


n 
109 
A.T.C 


1 v 

110 
C.T.n 


1 - 

,C.T.n 


V 

112 
C.T.n 


40 


i 

113 
A.T.h 


V 

114 
C.T.n 


q 

115 
C.C.n 


lib 
C.C.n 


117 
A.C. n 


1 i j 9 
! 118 119 
1 A.T.hjc.C.n 


,5. 

A.T.h 


45 


k 
121 
A. A.r 


i_ 

122 
T.T. r 
C.T.n 


f 

123 
T.T. y 


k 
124 
A. A. r 


k 
125 
A . A . r 


f 

126 
T . T . y 


c 

127 
A.C.n 


s 
123 
T.C.n 
A.C.y 


"1 


k 

129 
A. A. r 


a 
130 
C.C.n 


s 
13 1 
T.C.n 
A.C.y 


. 

132 
T.A.r 
T.C. A 


133 

T.A.r 
T.C. A 


134 
T.A.r 
T.C. A 
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Table 4: Table of Restriction Enzymes 

Table of restriction enzymes with IUB codes. 
Suppliers : 

S=Sig:na Chenical Co. 
P.O. Bex 14 508 
St." Louis, Ho. S3 17 8 

3=3ethosda Research Laboratories 
P.O.Box 6009 

Gaithersburg, Maryland, 20377 

tt=Boehringer Mannheim Biochenica Is 
794 1 Castlevay Drive 
Indianapolis, Indiana, 46250 

I=International 3 iochenica Is , Inc. 
P.O.Box 9558 

N'ew Haven, Conr.ecticutt, 05535 

fl=Iiew England BioLabs 
32 Tozer Soad 

Beverly, Massachusetts, 01915 

P=Prqr-.ega 

2800 S. Fish Hatchery Road 
Madison, Wisconsin, 53711 

T=Stratagene Cloning Systems 
11099 North Torrey Pines Road 
La Jolla, California, 92037 
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+ before enzyme name weans that overhang can not be 

sel f -complementary . 
t before enzyme name ncans that overhang nay or may 

not be sel f-complemcntary . 
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Table 4, continued. 



Enzyme Recoqnit. 
Aat II GACGTC 
t Acc I GTHKAC 
ACC III TCCCGA 
Acv I GRCCYC 
Af 1 II CTTAAG 
Uf I IIIACRYCT 
Ah3 III TTTAAA 

t.VlvN I CACHNHCTG 



Supply 

<S.M, I,N.T 
<B,M, I.N, P, 



Aoa I 
ApaL I 
Ase I 
Aso7i8 
Asu II 
*Ava I 



GCCCCC 
CTCCAC 
ATT AAT 
GGTACC 
TTCGAA 
CYCCRG 



Ava III ATCCAT 



Avr II 
Dal I 
BanH I 

\Mn I 

Bbe I 
I Sbv I 
tBby I 
Be L I 
+ Bg_L I 
Bgl II 
+ Bvn I 
+£sni I 
BspH I 
X BsoM 
BssH I 
+ BstE 
I BstX 
Ctr I 
Cla I 



CCTAGC 
. TGGCCA 

GCATCC 

GCYRCC 

GGCGCC 

GCACC 
I CAAGAC 

TGATCA 

GCCKNNNNGCC 

AGATCT 

GCATC 

GAATCCN 

TCATGA 
I- ACCTGC 
I CCCCGC 

iicct;;acc 

I CCANNN'tlN'NTCG 
YGCCCR 
ATCGAT 




+ 0ra II RGCNCCY ? 2,5 

+ D£a IIICACKNNCTC P 6. 3 

Eco4l IIAGCCCT P 3, 3 

+£coH I CCTNN'i.'NNAGG P 5 , 6 

ECOR I CAATTC P 1. 5 

EcoR V GATATC P 3,3 

» Esp I CCTNACC P 2, 5 

t Fok I GGATG nP 1- . 18 

GJ1 II YCCCCC nP 1, 5 

Hae I ■ WCGCCW P 3, 3 < 

Hae II RCCCCY P 5. r <S. B.M. I. K.T 



<S, 3,M,H,T; 

SfMn III : I 
<::.T ; E COO109 
<K. S, T 
<nor.e 
<:; { soon) 
<S. 3. M. I 



i::x 



<M 
< 



N, P, 



, N , T 



^18 




0 
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Table 


4 , 


continued . 


+ Hga I 


CACCC 


nP 


10, 15 


<r; 


\HqiA I 


CWGCWC 


p 


5, 


1 


<N 


UlgiC I 


GGYRCC 


p 


0 , 


6 


< 


\HqiJ IICRGCYC 


p 


5 , 


1 


*-Ban II:S,M,I,N,T 


Hind II 


CTYRAC 


p 


3 , 


3 


<M; &Hinc II:S,B,I 












,N,P,T 


Hind I I IAAGCTT 


p 


1 , 


5 


<S, B,M, I, N, P,T 


Ho^ I 


CTTAAC 


p 


3 , 


3 


<S, B.M, I, N, P ,T 


+ Hph I 


CGTGA 


nP 


13 , 


1 2 


<N,T 


Kp" I 


CGTACC 


P 


5, 


1 


<S, 3,M, I. N. P/T ; 












Asp718 :M 


+ i<bo 1 1 


CAACA 


nP 


1 3 , 


1 2. 


<S t B , I , N 


M I u I 


ACCCCT 


P 


1 , 


5 


<M , N , P , T 


Mst I 


TGCGCA 


P 


3 t 


3 


<T .* £s£ I : S , N 


a c I 


CCCGGC 


P 


3 , 


3 


<M, N,T 


r I 


GGCGCC 


P 


2 t 


4 


<B, N , T 


Nco I 


CCATCG 


P 


1, 


5 


< B , M , N , P , T 




CATATG 


P 


2, 


4 


<B, N,T 


Nhe I 


GCTACC 


P 


1, 


5 


<M,N, ?, T 


Not I 


CCGGCCGC 


P 


2, 


6 


<M, N, P,T 


Uru I 


TCGCCA 


P 


3, 


3 


<B,M,N,T 




RCATCY 


P 


5, 


1 


< 


NspS II 


CMGCKG 


P 


3, 


3 


< 


+ Pf 1M I 


CCANfWHNTGG 


P 


7 , 


4 




*Ple I 


CAGTCUNNNN 


nP 


9 , 


10 


<U 


Pr.^C I 


CACGTC 


' P 


3 , 


3 


<none 


+ £2l?.M I 


RGGWCCY 


P 


2 , 


5 


<U 


+ Pss I 


RGCMCC'/ 


P 


5, 


2 


<l 


Pst: "l 


CTCCAC 


P 


5, 


1 


<S, B, M, I , N, ?, T 


Pvu .1 


CGATCG 


P 


4 , 


2 


<S, D, N\ Bf Xor II) , M 












, P,T 


Pvu II 


CACCTG 


p 


3, 


3 


<S,3,K,I,N,P,T 


+Rsr II 


CCGWCCG 


P 


2, 


5 




Sac I 


GAGCTC 


P 


5, 


1 


<B(Ssr I] .M.I.N.P, 


Sac II 


CCGCCG 


P 


4 , 


2 


T 

<B(S«£ 11} . I, N, P,T 


Sal I 


GTCGAC 


P 


1, 


5 


<B,M, I.U, P ( T 


+ Sau I 


CCTNAGG 


P 


2, 


5 


<M; Cvn I:B; Mst II 












:T; nsu3 6 I:N; Aoc 


Sea I 


AGTACT 


P 


3 , 


3 


<M, t;, P,T 


tSfaN I 


CCATC 


nP 


10, 14 


<N 


+s_ii I 


CCCCNNtUiWCCCC P 


s, 


5 


<N, P,T 


Srca I 


CCCCGG 


P 


3 , 


3 


<B,M,I,H,P,T 


SnaD I 


TACGTA 


P 


3, 


3 


<H, N,T 


Spe I 


ACTAGT 


p 


1, 


5 


<M,N,T 


Sph I 


GCATGC 


P 


5, 


1 


<8, M, I ( N, P,T 


ssd r 


AATATT 


P 


3, 


3 


<M,N,T 


Stu I 


ACCCCT 


P 


3, 


3 


<t\,U, I ( Ajvt I) ,P,T 


iSty I 


ccwvcg 


P 


1, 


5 


<N, P,T 



I:T 




- ^ 




Mi 





Tabic e. , continued. 



* Taq II CACCCA 
* Taq II'CACCCA 
+ TthllU CACNfJNCTC 
t Tthlll ll CAARCA 



Xba 
Xca 
Xho 



TCTACA 
GTATAC 
CTCCAC 



nP 17,15 
nP 17, 15 
P 5 
■ 16, 14 
1, 5 



nP ■ 
P 



10 



Xho II 
Xr.a I 
Xna 



RCATCY 

cccccc 

III CGGCCC 



15 Xnn. I 



GAAHNHNTTC 



5, 5 



<none 
<none 
<I , N , T 
<none 

<B,M, I,N, P,T 
<U ( soon) 
<B,M,I,P,T; Cc£ 
T ; PaeR7 I:t 
<M.T ;Hf BstY 1) 
< I , N , P,T 
<B; Eas I:M; 

Eco52 I:T 
<tJ,Mf ASP70Q ) ,T 



N restrct - 



100 



20 



25 



Notes: 
Syran : 
cut* : 



P for pal indror.ic, nP for non-pal indronic 

first nunber indicates position of cut in 
top strand, 1 r.cans after first base of 
recognition; second nunber indicates 
position of cut in lower strand, counting 
left-to-right. 





Table 5: Potential sites in ipbd gene. 



Summary of cuts. 



5 


Enz 




%Acc I has 


3 elective sites 


: 96 169 281 




Enz 


_ 


All tr has 


1 elective sites 


: 19 




Enz 




Ana I has 2 


elective sites : 


102- 103 




Enz 




Asu II has 


1 elective sites 


: 381 




Enz 




Ava III hcs 


1 elective sites 


: 3 14 ■ 


1 u 


Enz 




espM II ha* 


1 elective sites 


: 72 




Enz 




BcjLH II has 


2 elective sites 


: 67 115 




Enz 




SBstX I has 


1 elective sites 


: 323 




Enz 




+0ra II has 


3 elective sites 


: 102 103 226 




Enz 


= 


+EcoM I has 


2 elective sites 


: 62 94 


15 


Enz 


— 


+ F.pd I r.as 


2 elective sites 


: 57 187 




Enz 


S3 


Hind III has 6 elective sites : 9 23 60 










287 361 386 




Enz 


S3 


Kpn I has 1 


elective sites : 


48 




Enz 




Hlu I has 1 


elective sites : 


314 


20 


Enz 




Nar I has 2 


elective sites : 


238 343 




Enz 


=S 


Nco I has 1 


elective sites : 


323 • 




Enz 




N'he I has 3 


elective sites : 


25 289 388 




Enz 




Nru I has 2 


elective sites : 


38 65 




Enz 




+ PflM I has 


1 elective sites 


: 94 


25 


Enz 




PnaC I has 


1 elective sites 


: -2 2 3 




Enz 




+ ppu^ I has 


2 elective sites 


: 102 226 




Enz 




»Rsr II has 


1 elective sites 


: 1C2 




Enz 




+S_fi I h j 3 


2 elective sites 


: 24 261 




Enz 




Spe I has 3 


elective sites : 


12- 4 5 379 


30 


Enz 




Soh I has 1 


elective sites : 


221 




Enz 




Stu I has 5 


elective sites : 


23 70 150 












237 3S6 




Enz 




%Sty I has 


6 elective sites 


: 11 4 4 










143 2 


63 323 3S3 


35 


Enz 




Xba I has 1 


elective sites : 


84 




Enz 




Xca I has 2 


elective sites : 


96 169 




Enz 




Xho I has 1 


elective sites : 


85 




* Enz 




Xna III has 


3 elective sites 


: 70 2C9 
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Enzymes not cutting ipb.1 . 



PstE II 
Mot I 
Sr,,i I 



Avr II 
EcoR I 
45 Sac I 
Xna I 



BanH I 
EcoR V 
Sal I 



Bel I 
Hna I 
Sau I 



lu^ ■ lit."- '■ : -^■■'■■•^••^ 
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Table 6: Exposure of amino acid types in T4 lzm & H E V L . 

HEADER HYDROLASE (O-CLYCOSYL) 18-AUC-86 2L2H 
COMPND LYSOZYME (E.C. 3.2. 1.17) 
AUTHOR L.H. WEAVER, B.W.MATTHEWS 

Coordinates from Brookhovcn Protein Dat-i Bank: 1LYH. 
Only Molecule A was considered. 

HEADER HYDROLASE (O-C LYCOS YL) 29-JUL-82 ' 1 LYM 

COMPND LYSOZYME (E.C. 3. 2. 1.17) 

AUTHOR J . HOCLE , S . T . RAO , M . SUNDARALI WGAM . 

Solvent radius = 1.40 Atonic radii in Table 7. 

Surface area measured in Angstroms 2 . 



Type 



Hax 





U 


<area> 


ALA 


27 


211 


0 


CYS 


10 


239 


8 


ASP 


17 


271 


1 


GLU 


10 


297 


2 


PHE 


8 


316 


6 


CLY 


23 


IS5 


5 


HIS 


2 


297 


7 


ILE 


16 


273 


1 


LYS 


:o 


3 09 


2 


LEU 


24 


282 


6 


MET 


7 


293 


0 


ASN 


26 


273 


0 


PRO 


5 


239 


9 


CLN 


3 


299 


5 


ARC 


24 


344 


7 


SER 


16 


228 


6 


THR 


18 


250 


3 


VAL 


15 


254 


3 


TRP 


9 


359 


4 


TYR 


9 


335 


8 



sigma 



max 




min 
















exposedf f ractionl 
















214 


3 


207. 


1 


85 


1( 


0 


-0) 


245 


5 


2 34 . 


4 


38 


3 ( 


0. 


16) 


281 


4 


262. 


5 


127 


i( 


0. 




304 


9 


285. 


4 


100. 


7( 


0. 


34) 


325 


4 


307. 


5 


99 


8 { 


0 


32) 


133 


3 


183. 


3 


91 


9 ( 


0, 


50) 


301 


C 


294 . 


5 


32 


9( 


0 


11) 


285 


6 


269. 


6 


57 


5( 


0. 


21) 


321 


9 


j CO. 


1 


147 


1 ( 


0. 


45) 


304 


0 


269. 


3 


109 


0( 


0. 


39) 


299 


5 


283 . 


1 


88 


2 ( 


0 


30) 


235 


1 


262 . 


6 


143 


4 ( 


0 


53) 


242 


1 


234 . 


6 


I2S 


7 ( 


0 


54) 


305 


8 


291. 


5 


145 


9( 


0 


;?) 


355 


8 


326. 


7 


240 


7 ( 


0 


70) 


236 


6 


223. 


1 


95 


2 ( 


0 


■5 1) 


257 


2 


24 4 . 


2 


.139 


9( 


0 


56) 


261 


8 


245. 


7 


111 


1( 


0 


- 4 ) 


3G6 


4 


355. 


1 


102 


0( 


0 


23) 


342 


0 


325.0 


72 


.G( 


0 


22) 
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Table 7: Atomic radii 
Angs t r 



Calpha l - 10 

Ornrhnnvl 1.52 



J carbony 1 

"amide ^ 55 
10 Other atorr.s 1.80 



Table 8 

Fraction of ON A molecules having 
5 n non-parental bases when 

reagents that have fraction 
M of parental nt. 



10 


M 


-9965 


.97716 




.3577 


.79433 


.63096 




fo 


.9000 


. 5000 


. 1000 


.0100 


. 0010 


.000001 




f 1 


.09499 


.35061 


.2393 


.04977 


.00777 


. 0000175 




f2 


.00485 


. 1183 


. 2763 


. 1197 


.0292 


. 000149 




f3 


.00016 


.0259 


.2061 


. 1354 


.0705 


.000912 


15 . 


f4 . 


0COO04 


.00409 


..1110 


. 2077 


. 1232 


.003207 




f8 


0. 


2X10" 7 


.00096 


.0336 


.1182 


.030155 




f 16 


0. 


0. 


0. 


5xl0~ 7 


. 00006 


.027231 


20 


















£2 2 


0. 


0. 


0. 


0. 


0 . 


.0000059 




E2SC 


o 


o 


2 


5 


7 


12 



25 ,, raost ,, is the value of n having the highest 
probability.. 
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Table 9: best vgCodon 

Program "Find Optimum vgCodon." 

INITIALIZE-MEMORY-OF-ABUNDAJICES 

DO ( tl = 0.21 to 0.31 in steps of .0.01 ) 

. DO ( cl ° 0.13 to 0.23 in steps of 0.01 ) 

. . DO ( al . = 0.23 to 0.33 in steps of .0.01 ) 
Comment calculate gl from other concentrations 

. . gl - 1.0 - tl - cl - al 

. . . IF( gl -<je. 0. 15 ) 

. . DO { a2 « 0.37 to 0.50 in steps of 0.01 ) 

DO ( c2 « 0.12 to 0.20 in steps of 0.01 ) 

Comment Force D+E = R + K 

g2 - (gl*a2 5*al*a2)/ (cl+0. 5*al) 

Comment Calc from other concentrations. 

t_ i. - a2 - c2 - g2 

IF(g2.gt. b.l.and. t2.gt.0.1) 

CALCUUVTE-ABUNDAI/CZS 

....... COMPARE-ABUUDAJiCES-TO-PREVIC'JS-ONES 

end_IF_bloc)c 

end_DO_loop ! c2 

..... . end_DO__loop ! a2 

er.d_IF _bTock I if gl big enough 

. . . . end_DO_loop ! al 

. . end_DO_loop ! cl 
. ,end_DO_loop ! tl 

WRITE the best distribution and the abundances. 



mm 




20 
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Table 10: Abundances obtained 
from optinun vqCodon 



"Amino 



10 



15 



acid 


Abundance 


' A 


4 


. SO* 


D 


6 


. 00* 


F 


2 


861 


H 


3 


60* 


K 


5 


.20^ 


M 


2 


36% 


P 


2 


68% 


R 


6 


3 2% 


T 


4 


16* 


W 


2 


36* Ifaa 


StOD 


5 


20* 



Anino 
acid 



C 

E 

G 

I 

L 

U 

Q 

S_ 

V 

Y 



Abundance 



2.86% 
6. CO* 
6.60* 
86% 
8 2* 
20* 
60% 

02* mfaa 



6 . 60* 
5.20* 



ratio = Abun(W)/Abun(S) 



0. 4074 



25 



30 



i 


f 1/rat io) 3 


f rat iol 3 


stop- f ree 


i 


2.454 


.4074 


.9480 


2 


6.025 


. 1660 


.8987 


3 


14.733 


. 0676 


.9520 


4 


36.298 


.0275 


.8077 


5 


89 .095 


.0112 


.7657 . 


6 


2 13 . 7 


4.57 x IC" 3 


.7253 


7 


535 .3 


1.S6 X 10" 3 


. 6381 
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READ 
READ 
Fdwn 
Fup 
DO ( 
DO 



steps) 
7 steps) 



back 
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TabLe 11: Calculate worst codon. 

Program "Find worst vgCodon within Serr of given 

distribution. " 
IN ITI A LI ZE-MEMORY-OF- ABUNDANCES 

t Serr is X error level. 
READ Serr 

t Tli, Cli, All, Cli, 72i,C2i,A2i,C2i, T3i,C3i 
are the intended nt-d is tr ibution. 
Tli, Cli. Ali, Cli 
T2i, C2i, A2i, C2i 
T3i, C3i 
= l.-Serr 
= l.+Serr 

tl = Tli*Fdwn to Tli*Fup in 7 
( cl « Cli*Fdwn to Cli*Fup in 
DO ( al = Ali*Fdwn to Ali*Fup in 7 steps) 
gl = 1. - tl - cl - al 
IF( <gl-Cli)/Cli .It. -Serr) 
gl too far below Cli, puch it 
. gl = Cli*Fdwn 
. factor = (l.-gl)/(tl + cl + 
. tl » tl*factor 
. cl = cl*factor 
. al = al*factor 
. .end_IF_block 
IF( (gl-Cli)/Cli .qt. Serr) 
gl too far above Cli, push it 
gl = Cli* Fup 

factor = (l.-gl)/(tl + cl + 
tl = tl*£actor 
cl = cl* factor 
al = al*factor 
end IF_block 
DO ( a2 = A2i*Fdvn to A2i*Fup in 7 steps) 
DO ( c2 = C2i*Fdwn to C2i*Kup in 7 steps) 
. DO (g2=C2i*Fd-n to C2i*Fup in 7 steps) 
Calc t2 fron other concentrations. 
. . t2 = 1. - a2 - c2 - g2 
. . IF( (t2-T2i)/T2i .It. -Serr) 
too far below T2 i , push it back 
. . . t2 = T2i*Fdwn 

; . . factor « (l.-t2)/(a2 + c2 + g2) 
. . . a 2 = a2* factor 
. • . . c2 = c2*factor 
. . . g2 = g2*factor 
. . . .end_IF_block 
. . IF( (t2-T2i)/T2i .gt. Serr) 
t2 too far above T2 i , push it back 
. . t2 « T2i*Fup 

. . factor = (l.-t2)/(a2 > c2 * g?) 



al) 



back 



al) 





10 



15 



25 
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Table 1 1 , continued . 

a2 = a2 * factor 

. c2 = c2*factor 

g2- = g2 *f actor 

end_IF_bIoc)c 

IF(g27gt7 0.0 .and. t2,gt.0.0) 

t3 «= 0. 5* ( 1. -Serr) 

....... g3 » 1. - t3 

CALCULATE-ABUNDANCES 

........ COMPARE-ABUIIDANCES-TO-PREVIOUS-ONES 

t3 = 0.5 

q3 » i. - t3 
CALCULATE- ABUNDANCES 

COKPARE-AOUNOANCES -TO-PREVIOUS-ONES 

t3 0.5*(1.+Serr) 

g3 = 1. - t3 

CALCULATE- A BUN DANCES 

COM PARE -ABUNDANCES -TO -PREVIOUS -ONh'S 

end_IF_blocJc 

end_DO_loop ! g2 

.... . . er.d_DO_loop ! c2 

end_DO_loop ! a2 

. *. . . end_DO_loop ! al 
. . . end_DO_loop I cl 
. . end_DO_loop ! tl 

WRITE the WORST distribution and the abundances. 




/ J 



1: 



i 
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il 

i 
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Table 13, continued. 

R i 20 21 23 24 25 26 27 23 29 30 31 32 33 

-« Z - " E 

---------TP 

"2 2-L2RK---RR-EX 

-1 P-QDDN---QK-RT 

1 RRH-HRRIKTRRRCD 

2 RPRPPPKEVHHPFL 

3 KYTKKT.CDAR PDL? 
A ^AFFFFDSADDFDI 

5 cccccccccccccc 

6 IEKYYNEQWDDLTE' 

7 LLLLLLLLLKKESQ 
3 HIPPPLPGPPPPPA 
9 RVA-AAPKYVPPPPFG 

10 NAE DDEVSIDD YVD 

1 1 PA PPPTVARKTTTA 

12 GCCGGCGCCG KGGG 

13 RPPR RRPPPS'IPpl 
1** CCCCCCCCCCCCCC 
15 YMKKLWRMR - - KRF 
1* DFAAAAAGACQAAG 
I 7 KFSHYLRMFPTKGY 
-18 rillMIFTIVVMFM 

19 PSPPPPPSQRRIKK 

20 AAARRARR LAARRL 

21 efffffyywffyyy' 

22 YYYYYYY FAY Y FNS 

23 YYYYYYYYFYYYYY 
^ 4 W S N D N W M fJ D D K N N K 

25 QKWS PSSGATPATQ 

26 KGA AAHSTVRS KRE 

27 KAASGLSS KLAATT 

28 KMKNNHKMCKKCKK 

2 9 QKKKKKRAK7RFQN 

30 CCCCCCCCCCCCCC 

31 E. YQNEQEEVKVEEE 

32 R PLKKKKTLAQTPE 

33 FFFFFFFFFFFFFF 
2- DTHIINIQPQRVKI 

35 WYYYYYYYYYYYYY 

36 S5GGCGCGGRCCGG 

37 C'GCGGGGGCGGGGG 

38 CCCCCCCCCCCCCC 

3 9 CRKPRGGM-QDDKKQ 

40 GCGGCCGCCCCAGG 

41 M -N M N N N [f N K D D K N N 

42 SAAAAAAGCHHSGD 
^ 3 N N N H N N M U u C C N N N 





I;, 
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Table 13, continued. 



?1 



R J 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 



20 21 22 
R R R 
F 
K 
T 

r 

E 
E 
C 
R 
R 
T 
C 
V 
V 



F 
K 
T 
I 
E 
E 
C 
R 
R 
T 
C 
I 
C 



23 24 25 
N N N 
F 
K 
T 
W 
D 
E 
C 
R 
H 
T 
C 
V 
A 
S 
G 
I 



26 27 28 29 
M N K N 



30 31 32 33 
N N R R 



F 
K 
•T 
W 
0 
E 
C 
R 
Q 
T 
C 
C 
A 
S 
A 



F 
H 
T 
L 
E 
E 
C 
E. 
K 
V 

c 

G 
V 
R 
S 



F 
K 
T 
E 
T 
L 
C 
R 
C 
E 
C 
L 
V 
Y 
P 



'V. 



2 0 Oendroasois anqu 

C13 S2 C3 toxin 
21 Oendroaspis pclv 



st iccps (Eastern Green Maiaba) 



(DUFT85) 

lopis polvlGpes (Black namba) B toxin 



(DUFT85) 

22 O end r oaspis polvleois colvlcpes (Black Mamba) E toxin 



(0UFTS5) 

23 Vjparfl ar.modvtcs 

2 4 Vioera ar^norivtes 

2 5 Huncarus f>sciat 

26 Aner.onia sulcata 

27 Homo s.tpjens HI- 
2 8 Hor^o sapiens HI- 

29 beta bungarotoxi 

30 beta bungarotoxi 

31 Bovine spleen TI 

32 Tac two leus trirfe 
inhibitor ( NAKA3 

33 Bor.bvx nori (sil 



TI toxin (DUFT85) 

CTI toxin (DUFT8 5) 
us vni B toxin (DUFT35) 

(sea aner.one) 5 II (DLTT8 5) 
14- "inactive" domain " (DUFT85) 
14 "active" domain (DUF785) 
n Bl (0UFTS5) 
n 02 (DUFT85) 

II (FIORS5) 
ntntus (Horseshoe crab) hemocyte 



7) 

kvorn) SCI-III (SASAS4) 



Notes : 

a) both beta bungarotox ins have residue 15 deleted. 

b) B. r.ori has an extra residue between C5 and C14; we 
have assigned F and G to residue 9. 

c) all natural proteins have C at 5 ( 14, 30, 3$, 50, £ 

d) all homologues have F33 and C37. 

e) extra C's in bungarotoxins fora interchain cystine 
bridges 



m 

I 

f.-.V; 

- ■ t- 

t.-.r 

m 
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Table 15: Aoino acids observed at each Residue 
BPTI honologues 



Res. 
-5 

-4 

-3 
-2 
-1 

I 

2 



5 
6 
7 
8 
'J 
10 
1 1 
12 
13 
14 
15 
16 
17 
13 
19 
20 
2 1 
22 
23 
24 
25 
26 
27 
23 
29 
30 
31 
32 
33 
34 
35 
36 
37 
33 
39 



Nur±>er 
Di f f ernet 
AAs 

2 

2 

5 
10 
10 
10 

9 
10 

7 

1 
10 

5 

7 

9 
10 
10 

2 

5 

3 
12 

7 
12 

6 

7 

5 

4 

6 

2 

4 
10 

9 

5 

7 
10 

1 

7 
11 

1 
11 



Contents' 
0 -32 
E -32 

T P F Z -29 

Z3 R3 Q2 T2 H C L K E -18 
04 T2 P2 Q2 E C N K R -13 
R21 A2 K2 H2 P L I T C 0 
F20 R4 A2 H2 N E V F L 
D15 K6 T3 R2 P2 S Y G A L 
F19 D4 L3 Y2 12 A2 S 
C33 

Lll E5 U4 K3 Q2 12 Y2 D2 T 

L18 Ell K2 S Q 

P26 H2 A2 I L C F 

P17 A6 V3 R2 Q L K Y F 

Yll E7 04 A2 tI2 R2 V2 S I C 

T17 P5 A3 R2 I S Q Y V K 

G32 K 

P22 R6 L3 N I 
C31 T A 

K15 R4 Y2 M2 L2 -2 
A22 G5 02 R K 0 F 
R12 K5 A2 Y3 H2 S2 F2 
12 1 M< F3 L2 V2 T 
111 P10 R6 S2 K2 L Q 
RIO A7 S4 L2 Q 
Y13 F13 W I 
F14 'HA H2 A II S 
F 

S 



V C A I U F 



L M T G P 



Y32 

M26 K3 03 
A12 S5 Q3 P3 
K16 A6 T2 E2 
A18 S3 K3 L2 
G13 K10 H5 02 



L2 T2 K C 
R2 G H V 

H M 



L9 Q7 K7 A2 F2 R2 M C T N 
C33 

Q12 Ell L4 K2 V2 Y U 

T12 PS K4 Q3 E2 L2 C V S R A 

F33 

VI 1 13 T3 02 112 Q2 F H P R K 
Y31 W2 



BPTI 



R 
P 
D 
F 
C 
L 
E 
P 
P 

v . 

T 

C 

P 

C 

K 

A 

R 

T 

I 
P. 

Y 
F 
Y 
U 
A 
K 
A 
G 
L 
C 
Q 
T 
F 
V 
Y 
G 
G 



EC 



m 

i 



r. 



r:. - - *- . 




Q 
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Table 16: Exposure in BPTI 
Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13-HAY-87 

COMPND BOVINE PANCREATIC TRYPSIN INHI3ITOR 

COMPND. 2 (/BPTI $ , CRYSTAL FORK/IIIS) 

AUTHOR A . WLODAWER 

Solvent radius = 1.40 
Atomic radii given in Table 7 

Areas in Angstroms -squa red . 

Not Not 
Total Covered covered 
area by M/C fraction at all fraction 



Residue 



ARC 
PRO 
ASP 
PHE 
CYS 
LEU 
GLU 
PRO 
PRO 

TYR 10 

THR 11 

CLY 12 

PRO 13 

CYS 14 

LYS lb 

ALA 16 

ARG 17 

ILE 18 

ILE 19 

ARG 2 0 

TYR 21 

PHE 2 2 

TYR 2 3 

ASN 2 4 

ALA 25 

LYS 26 

ALA 27 

CLY 2 8 

LEU 29 

CYS 30 

GLN 31 

THR 3 2 



342.45 
239 . 12 
272 . 39 
311.33 
241.06 
280.93 
291. 39 
236. 12 
236. 09 
330.97 
249. 20 
184 .21 
240. 07 
2 37 . 10 
310. 77 
209.41 
351.09 
277 . 10 
278.03 
339. 11 
333.60 
306.08 
333.66 
264 .88 
211. 15 
313.29 
210.66 
186. B3 
280.70 
238.15 
301. 15 
251.26 



205.09 
92.65 
158.77 
137.62 
48.36 
151.45 
128. 91 
128.71 
109.82 
153.63 
80. 10 
56.75 
13C.25 
75.55 
2C0. 25 
66.63 
243.67 
100.51 
146.06 
144.65 
102.24 
70. 64 
77.05 
99.03 
85.13 
216. 14 
96.05 
71.52 
132.42 
57. 27 
141.60 
138. 17 



0. 5939 
0 . 3375 
0.5829 
0.4427 
0. 2006 
. 5390 
4 4 24 
. 5451 
.4652 
.4G42 
0.3214 
0 . 303 1 
0. 5426 
0 . 3 136 
0.6444 
0 . 3 132 
0 . 6940 
0.3627 
0. 5254 
0. 426f 
0. 3065 
0. 220S 
0.2275 
0.3739 
0.4032 
0. 6S99 
0. 4560 
0.3823 
0.4713 
0. 2405 
0. 4709 
C.5499 



152.49 
47 . 56 

143.23 
43.21 
0.23 

115.87 
90. 39 
99 . 98 
4 5 . 80 
79 . 49 
64 .99 
23 . 05 
75.27 
53.52 

192 . 00 
4 5. 59 

201.43 
53.95 
96 . 05 
43.81 
69. 67 
23.01 
17.34 
33.69 
48.20 

202 .84 



.78 
.09 



93.61 
19. 33 
82.64 
76.4 7 



4453 
1989 
5253 
1388 
0010 
4 124 

3 102 

4 23 4 
1940 
2402 
26C3 
1252 
3126 
2257 
6178 
2177 
5739 
2127 
3455 
1292 
20S9 
0752 
0512 
1461 

0.2283 
0. 6474 
0. 2601 
0. 1713 
0. 3335 
0.0612 
0. 2744 
0. 3043 



6PTI 



0": 

k 



M 



m 




261 



Table 16, continued . 



PHE 3 3 
VAL 3 4 
TYR 35 
CLY 3 6 
CLY 3 7 
CYS 38 
A-RC 3 9 
ALA 40 
LYS 4 1 
ARC 4 2 
ASK 43 
ASN \A 
PHE 4 5 
LYS 4 6 
SER 47 
ALA 48 
GLU 4 9 
ASP 50 
CYS 51 
MET 52 
A£G 5 3 
THR 54 
CYS 55 
CLY 56 
CLY 57 
.ALA 50 



304 . 27 
251.56 
332.64 
187 . 06 
185.28 
234 . 56 
4 17.13 
209.53 
314 . 60 
349.06 
266.47 
269. 65 
313,22 
309 .83 
224 . 78 
211.01 
286. 62 
299. 53 
233 . 68 
293.05 
356.20 
251.53 
240.40 
184 . 66 
106. 5S 
no pos 



59 
109 
80 
11 
• 84 
73 
304 
94 
166 
232 
38 
91 
69 
217 
69 
82 
161 
156 
24 
89 
224 
116 
69 
60 
49 
it ion 



.79 
.73 
. 52 
.90 
.26 
. 64 
. 62 
.01 
.23 
.83 
. 53 
.03 
.73 
. 18 
. 11 
.06 
.00 
.42 
. 51 
. 43 
.61 
.43 
.95 
.79 
.71 
giv 



0. 1965 
0.4364 
0.2421 
0.0636 
0.<548 
0. 3139 
0. 7303 
0.4487 
0.5234 
6670 
1446 
3373 
2226 
7010 
3075 
3889 
5617 
0. 5222 
0. 1027 
0. 3054 
0. 6306 
0.4629 
0. 2910 
0.3292 
0.4664 
in in Pro 



0. 
0. 
0, 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0 . 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 

tein Data 



18.91 
42.36 
15. 05 
1.97 
39. 17 
26.40 

250.73 
52.95 

108.77 

179 . 59 
5. 
23. 
14 . 79 

155.73 
24 . 80 
3 1.07 

100. 01 
95.96 
0. 00 
66. 70 

139. 75 
51. 64 
0. 00 
32. 78 
38.26 



. 32 
. 39 



0622 
1684 
0452 
0105 
2114 
1125 
*>011 
2527 
34 57 
514 5 
0200 
0367 
04 72 
5026 
1103 
1473 
3489 
3204 
0C00 
2276 
5327 
2053 
0000 
1775 
3592 
Bank 



"Total area" 



"riot covered 
by M/C" 



"Not covered 
at all" 



is the area measured by a rolling sphere 
of radius 1.4 A, where only the a toss 
within the residue are considered. This 
takes account of conformation. 

is the area measured by a rolling sphere 
of radius 1.4 A where all nain-chain atoms 
are considered, fraction is the exposed 
area divided by the total area. Surface 
buried by main-chain atorr.s is nore 
definitely covered than is surface covered 
by side group atoms. 

is the area measured by a rolling sphere 
of radius 1.4 A where all ator.s of the 
protein are considered. 





* U 



... - ...J^j 
, ~->4— * 




LCI 

pLG2 

pLG3 
pLC4 

pLC5 

pLC6' 

pLG7 

pLG8 

pLC9 
pLCIO ' 
pLGll" 



Table 17: Piasuids used 'in Detailed Exaap'c 
rnnrpr.ts 

Ml3npl8 with fcva II/AjLt II/Acc I/OSX 
II/Sau I adaptor 
. LCI with arc" CoIEl of -PBR322 cloned 

into /Vic II/ASS I sites 
pLC2 with <£r, I site removed 
P LC3 with first part of o sp-pbd <jcne 
cloned into \LZ1. II/Sau I sites. 
Avr II/Asu II sites created 
p LG4 with second part of psp-phd qene 
cloned into tCLZ H/'&Sy " sites, ^ 
site created 

P LG5 with third part of osp-pbd •jcr.e 
cloned into A^i II/B*sH I sites, Rbe I 
site created 

pLG6 with last part of psp-aSd gone 
cloned into R&S sices 
P LC7 vith disabled psp-phd qene, saae 

length DHA. 

pLG7 nutated to display BPTI (V15 hPTI ) 
pLC3 * tet R gcr.c - a£2 a gene 
pU59 + tet R *je ne - <3 cne 



X: .i 




c 




./ 



r 



( 



10 



15 



25 



30 



40 
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Table 13: Enzyme sites eliminated when 
M13ir.pl3 is cue by Ava II 
and Br.u3 6 I 



Aha II 
Fsp I 
EcoR I 
Sna I 
Hind III 
Hind II 



Nar I 
Bal I 
Sac I 
BanH I 

ACC I 



Gdi II 
HqiE II 
Kpn I 
Xba I 
Psj: I 



Pvu I 
Dsu 3 6 I 
I 

Sal I 
Sph I 



20 

Aat II 
Bbv II 
BstB I 
Eco57 I 
Ess I 
Khe I 
35 Pf 1M I 
Rsr I 
Spe I 
' Xca I 



Table 19: Enzyr.es not cutting 
M13mpl3 



Af I I 
Bel I 
BstE II 
ScoH I 
Hpa -I 
Kot I 
PmaC I 
Sac I 
Stu I 
Xho I 



A pa I 
B^pM I 
PstX I 
ECO0109 I 
Mlu I 
Nru I 
Ppa I 
Sea I 
Sty I 



ZczR V 
Nco I 

?PuM I 
Sf i I 
TtV. 11 I 



i 



far 

t 

m 

m 
i 

m 




Table 21: Enr.ynxs to 5 



red on Ac"=:q CNA 



Ucc I 



P 



10 



Av.l HI 



Recognition 
CTMKAC 
CTTAAG P 
GCCCCC P 
TTCGAA 
ATCCAT 



20 



25 



; o 



; 5 



AvjC II 
3J-H I 
3C1 I 
3r,cM II 
SssK II 

»bscs r: 
* bssx r 

^Dra II 
+ £ccr£ I 
Ecog I 
f:coR V 

+ ESD I 

Hir.d HI 
Hca X 
Xpn - I 



L'ar 
L'i-S 

Mot 



I 
I 
: 
i 
i 

PnaC I 
+ Pou* I 
+ ?.sr II 
Sac I 

Sal I 



+ Sf i I 
Sra I 
Soe I 
Son I 
Stu I 
*Stv I 



CCTAGG 

CCATCC 

TGATCA 

TCCGGA 

CCGCGC 

GGTNACC 

CCANNSNM 

RCGNCCY 

GAAT TO 

GATATC 

CCTNACC 

AACCTT 

GTTAAC 

GGTACC 

ACCCGT 

ccczcc 

CCATGG 
GCTAGC 
CCGGCCCC 
TCGCGA 

cca:::;nnn 

CACGTG 
RGGVCCY 
CCGWCCG 
GACCTC 

GTCGAC 
CCTNAGG 

CGCc::r:»::::cci 

CCCGGG 
ACTAGT 
GCATGC 
ACGCCT 
CCWVGG 
G7ATAC 



cuts 

I & 
1 i 

5 4 

: & 

5 & 
1 i 



1 
3 
2 
1 

5 & 



4 

5 
1 

4 
1 

5 
3 
5 

5 
6 
4 

5 
6 
5 
3 
5 
5 
j 
I 

5 
4 
5 

6 



Supply 

<b,m. r p.t 

<N 

<m,:,:i,? ( t 

cP.Mf Sst.3 I) 
<T; N5_k I:M.H/P.T; 

<N 

<;i.t 

<N, ?,T 

<M.T ; ScoOlO?. I:N 
<;*i (soon) 
<S.3,M., I.M.P. T 
<5 . 3.M, I. N. 



<S. 3,M. I 
•~S , 3,M. I 



,S t ?,T 



<B.:i,T 



* *.-■:-. 



f J 

f 



3 <3 



5 & 



L i 
2 S 



<nene 
<n. r 

<3(S5t IJ ,M f I,N\?. 
T 

■:M; Cvn i: 



:T; ?.su:S I-': 1:1 

3 <b.m.:.:i.p.t 

5 <.Y.;M 
L <B. 

3 <N(sccn] 




Q 
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Table 22: ipbc* gene 

pbd modlO 29IIIS8 : 

lacUVS Rsr II/MT H/genc/IrsA attp.nuator/Kst II; ! 
5'- CGCaCCC TaT ! Est. II site 

CCACCC tttaca CTTTATGCTTCCCGCTCG tataat CTG ' ! lacUVS 



TGG 


a ATTGTG AGCGC ATA ACA ATT 








lacO 


CCT 


AGCAqq CtcaCT 










■ 


Sh in 


atg 


aag 


aaa 


tct 


ccg 


gtt 


ctt 


aag 


get 


age 


10. 


gtt 


get 


gfc 


gcg 


acc 


ctg 


gta 


ccg 


atg 


ctg 


20 


tct 


ttt 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag 


30 


ccg 


cca 


tat 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc 


40 


ate 


ate 


cgt 


tat 


ttc 


tac 


aac 


get 


aaa 


gca '. 


50 


ggc 


ctg 


tgc 


cag 


acc 


ttt 


gta 


tac 


ggt 


ggt . 


60 • 


tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg . 


70 


gec 


gaa 


gat 


tgc 


atg 


cgt 


acc 


tgc 


ggt 


ggc . 


30 


gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gee 


aaa 


gcg . 


90 


gec 


ttt 


aac 


tct 


ctg 


caa 


get 


tct 


get 


acc . 


100 


gaa 


tat 


ate 


ggt 


tac 


gcg 


tgg 


gec 


atg 


gtg ! 


110 


gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggt 


ate . 


120 


aaa 


ctg 


ttt 


aag 


aaa 


ttt 


act 


teg 


aaa 


gcg ! 


130 


tct 


taa 


tag 


tga 


qnttacc 


i 


BstS II 







M13 leader 



agtcta agcccgc ctaatga geggget tttttttt I teramator 
CCTgACC -3 ' - ! Mst II 



■j4 



9 O 
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Table 23: iobd DMA sequence 

DMA Sequence file - UV5_M1 3 PTIM1 3 . DMA ; 17 
DMA Sequence title « 
pbd raodlO 29III83 : lac-UVS Rs r I r/Avr I I/geno/TrpA 

a tconua tor/Mstl 1 ; ! 

1 C|GCA|CCC|TAT|CCA|CCC|TTT|ACA|CTT[TAT|GCT|TCC|GCC|TCC| 
41 TAT | AAT | GTG | TGG | AAT | TGT | GAG j CGG j ATA | ACA | ATT | CCT | AGG j AGG | 
83 CTC j ACT | ATG j AAG | AAA | TCT | CTC | CTT [ CTT | AAG [ GCT j AGC j GTT j CCT j 
12 5 GTC | GCC | ACC | CTC | GTA | CCC | ATC | CTG | TCT 1 TTT | CCT | CGT | CCC[ GAT I 
167 TTC!TGT|CTC|GAC|CCC|CCA|TAT|ACT!CCG|CCClTCC|AAA!CC=!CCCi 
209 ATC | ATC | CCT | TAT | TTC | TAC | AAC | GCT | AAA | CCA | GCC | CTG | TGC ! CAC j 
251 ACC | TTT | CTA | TAC | CCT | GCT | TGC j CGT ] CCT J AAC | CCT [ AAC | AAC \ TTT | 
29 3 AAA | TCG | CCC | CAA | CAT | TCC | ATC | CCT | ACC | TGC ( CGT \ GCC j GCC \ CCT ! 
3 35 CAA|CGT|CAT|CAT|CCG|GCC!AAv\iGCGlCCC|TTT| AAC | TCT ! CTG | CAA | 

3 7 7 CCT j TCT | CCT | ACC | CAA | TAT | ATC \ CCT | TAC | CCC | TCG ] GCC j ATC ! CTC J 

4 19 CTC | CTT | ATC | CTT | CGT | CCT | ACC | ATC | CGT | ATC [ AAA | CTG j TTT | AAG | 
461 AAA | TTT | ACT | TCC | AAA | CCC | TCT | TAA | TAG | TGA| CGT | TAC) CAC | TCT| 
503 AAG|CCC|CCC|TAA|TCA|CCC|CGC|TTT|TTT|TTT|CCT|CAG|C 

Total = 539 bases 



i i 



m 

m 

m 
mi 

m 
tar 

m 
m 



m 
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Table 24: Surr.r.ary of Restriction Cuts 



y 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 



tAcc I has 
Acc III has 
Acy I has 
Af 1 II has 
* Afl III has 
Aha III has 
Aua I has 



Aso7 l 3 
Asu II 
%Ava I 
Avr II 
* Ban I 
Bbe I 
+ Bgl I 
+ B_LQ I 
t BspM I 



has 
has 
has 
has 
has 
has 
has 
has 
has 



BssH II has 
+- gstE II has 
* 3stX I has 
Cfr I has 
+ p_ra II has 
t- Esp I has 
\ Fo* I has 




%:-:aiJ II has 
Hind III . has 
-Hch I has 
Ken I has 
+ £i£o' II has 
Mlu I has 



Nar I 

Nco I 

::he I 

Nru I 



has 
has 
has 
has 



1 
1 

sp(7524) has 
scB II has 



> Pf 1M I 
- Fss I 
+ Rsr II 
+S*u I 
» Sfa» I 
+S_£i I 
Son I 
Stu I 
* stv I 



has 
has 

has 
has 

has 
has 
has 
has 
has 



1 observed sites : 
1 observed sites : 
1 observed sites : 3 
1 observed sites : 
1 observed sites 
1 observed sites : 
I observed sites : I 
1 observed sites : 
I observed sites : 
L observed sites : 
1 observed sites : 
3 observed sites : 

1 observed sites : 3 
1. observed sites : 

1 observed sites : 
1 observed sites : 
1 observed sites : 
1 observed sites 
1 observed sites : 

2 observed sites : 2 

1 observed sites : 
1 observed sites : 

1 observed sites : 

2 observed sites : 

L observed sites : 2 
I observed sites : 
1 observed sites : 
. 3 observed s ites : 
1 observed sites 
1 observed sites 
1 observed sites : 
1 observed sites : 1 

2 observed sites : 
1 observed sites : 4 
1 observed sites : 3 
1 observed sites : 4 

observed sites : 1 
observed sites : 1 
1 observed sites 
L observed sites : 
1- observed sites : 
1 observed sites : 

1 observed sites : 
I observed sites 

2 observed sites 

1 observed sites 
I observed sites : 
1 observed sites : 

2 observed sites 



259 

162 
23 
109 
: 404 

292 
93 . 
133 • 
47 1 
175 
76 

138 328 540 
28 
352 
346 
319 
205 
: 493 

4 13 
99 350 

193 
277 
213 

299 350 
40 
323 
473 

133 323 540 
: 193 
: 377 
340 
38 

93 304 

04 
28 
13 
15 
23 
: 311 
332 
13 4 
193 

3 

535 

144 
351 
1 1 



209 



3 



is 



m 



I 





•:r ■ - 



0 



© 
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Table 24, continued. 



Enz » Xca I has 
Enz « Xho I has 
Enz = Xjna III has 



1 observed sites : 259 
1 observed sites : 175 
1 observet sites : 299 



Enzymes that do not cut 




AuaL I 
3bv r 
3scH I 
tccR I 
'vst I 
Pr.aC I 
Sac r 
SnaB I 



Ase I 
Rbv II 

Cla : ■ 

CccR V 
Nae I 
PruH I 
Sac II 
Spe I 



Tth 1 I 1 II X£o i: 



Ava III 
3c 1 I 
Dra III 
KgiA I 
Nee I 
Pst I 
Sal I 
Ssd I 
Xna r 



t ;. \ ^ 



Xr,n I 
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Table' 25: Annotated Sequence of jpb-i gene 



5 ' - C | CCA | CCC I TAT | CCA | CCC I TTT | ACA I CTT | TAT | 
' | Rsr II 1 I -35 ! 



| CCT | TCC | CCC | TCC I TAT | AAT | CTC I TCC I 
I -10 L 



28 



52 




I AAT | TCT | CAC J CCC | ATA | ACA | ATT I 
I lac operator [ 



I CCT [ ACC I ACC | CTC | ACT | 
1 Av r III 



1 S. P. | 



1 
ATC 



k 
2 
AAG 



K S 
3 4 
AAA I TCT 



1 
5 

CTC CTT 



1 
7 

CTT 



3 

AAG 



a f i r t 



73 



88 



a 


s 




9 


10 




CCT 


AGC 


113 


N^e I 





Iv a 
llj 12 
( GZ&\ CCT 



13 
CTC 



a 
14 
CCC 



t 

15 

ACC 



1 rfru I? 



1 

16 
CTC 



17 

;ta 



1 Kon I I 



p | n j 1 | 
13 i 19 I 20 j 
CCC I ATC j CTC | 



14S 



c 



s 

21 
TCT 



f 

22 
TTT 



a 

23 
CCT 



r 
24 
CCT 



P 

25 
CCC 



d 
26 
CAT 



| Acd II j 



f I c 
2 T | 23 
TTC I TCT 



I 

29 



to I 



CTC CAC 
Av£\__rl 



Xho r ! 



p 


P 


y 


t 


9 


P 1 c [ k. 


a 


31 


32 


33 


34 


35 


36) 37; 33 


3 9 


CCC 


CCA 


TAT 


ACT 


CCC 


CCC [ TCC | AAA 


CCC 




Pf\M X 


1 


j 





40| 



t Apa T [ 
| Dra It | 
T~Pss_I [ 



i 


i 


r 


y 


t 


y 


n 


a 


4 1 


42 


43 


44 


45 


46 


47 


48 


ATC 


ATC 


CCT 


TAT 


TTC 


TAC 


AAC 


CCT 



V. I 
49 I 
AAA | 



178 



203 



235 






J | 





i.-Wrj ' »*i^^^V•.;*tt.-, 



.0 
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a 


g 


5, 


50 


51 




CCA 


CCC 


CTC 


I 


Stu 


II 


c 


r 


a 


61 


62 


61 


TCC 


CCT 


CCT 


s 


a 




70 


71 




TCC 


CCC 


CAA 


IXmaril! 


g 


a 


a 


80 


81 


82 


CCC 


CCC 


CCT 


Pbe X 




liar t 





c 
53 



k 
64 



q 

54 
CAC 



r 

65 
CCT 



t 

55 
ACC 



25, 


con*; inued . 


t 


V 


y 


g 1 g 


56 


57 


53 


59 J 60 


TTT 


CTA 


TAC 


CCT | CCT 




ACC r 






Xca I 





n n 
661 67 
AAC | AAC 



d 
73 



c 
83 
CAA 



c 
74 



m 

75 
ATC 



r 

76 
CCT 



f 

13 
TTT 



t 

77 
ACC 



k 

69 
AAA 



c 

78 
TCC 



g ! 

79] 
CCT I 



I Sph 1 1 



g 

84 
CCT 



I P 
87 
CCC 



a | k j a 
88 39 90 
CCC | AAA [CCC 

sf j i : 



d 

85 
CAT 



a 

91 
CCC 



d 

86 
CAT 



f 


n J s 


1 


q 


a 


s 


a 


z 


92 


93 94 


95 


96 


97 


93 


99 


100 


TTT 


AAC j TCT 


CTC 


CAA 


CCT 


TCT 


CCT 


ACC 



(Itir.d 3 t 



I G 
101 

j CAA 



a 
103 
CCC 



y 

102 
TAT 



i g 

103 104 
ATC J CCT 



y 

105 
TAC 



a 
106 
CCC 



107 
TCC 



LJlluJU. 



n 
109 
ATC 
3stX 



v v. 
1 10 I 111 
CTC j CTC 
I 



v 
112 
CTT 



T'co l\ 



268 



295 



325 



346 



361 



3U8 



409 



424 
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Tabic 25, continued. 





i 


V 


9 


a 


113 


114 


115 


116 


ATC 


CTT 


CCT 


CCT 


k 


1 


f 


.k 


121 


122 


123 


124 


AAA 


CTC 


TTT 


AAC 


s 








131 


132 


133 


134 


TCT 


TAA 


■ TAG 


TCA 



c 

117 
ACC 



113 
ATC 



k I t 
125 I 126 
AAA > TTT 



119 | 120 
CCT I ATC 



t I s j k I a | 
127 128 129 I 1301 
ACT | TCC i AAA | CCC | 
»ASU IT! 



448 



CCT ! TAC | CAC | TCT | 
SstS III 



| AAC | CCC j CCC | TAA I TCA | CCC j CCC | TTT | TTT | TTT I 
I Trp teminatcr J 



I CCT [ GAG | C - 3 ' 
I San I L 



Note the following enzyne equivalences, 



502 



53: 



539 



X na II r 
Acc III 
Pra II 
Asu II 
Sau I 



= £JL3 I 

= bsph i: 

= ECOO109 
= BSCS I 
= Bsu36 I 




Table 27: DtfA_synthl 
5' I CCC I TCC I GTC I CCA I CCC I TAT ! CCA \ CCC I TTT I ACA I CTT I TAT I 
[ CCT I TCC I CCC I TCC 1 TAT I AAT ! CTC | TCC I 



! AAT I TCT I GAG I CCC I AT A f ACA I ATT ! 

dig?-! - 3 ' - g t taa 



ICCTlACCl, 
gga tec 



/ 3 ' - ol ig $ 3 
1 CCC I CCT 1 CCT I TCC 1 AAA | CCC j 
egg cga gqa age ttt cgc 



| TCT I TAA | TAG | TCA | GGT j TAC | CAC | TCT | 
aga af.t ate act cca atg gtc aga 



| AAG | CCC | GCC | TAA [ TGA | CCC [ CCC [ TTT | TTT | TTT [ 
ttc egg egg att act cgc ccg aaa aaa aaa 



| CCT | CAG | CCA | GGT [ CAG j CO 
gga etc cgt cca etc ye - 5' 



"Top" strand 00 

"Bottom" strand 100 

Over Lap 2 3 (14 c/g and 9 a/ t.) 

Net length 158 



^ fifth 'inrf" 
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Tabic '23: DIIA_scq2 

5'- |gc*a|cca|acg| 
spacer I 




CCT | ACC I ACG I CTC | ACT | 



I s, p- I 





k 


k 


s 


1 


V 


1 


k 


a 


s 


1 


2 


^ 


4 


5 


6 " 


7 


3 


9 


10 


ATC 


AAC 


AAA 


TCT 


CTC 


GTT 


err 


AAC 


CCT 


ACC 














Afi ir 


Mho r 


V 




V 


a 


t 


1 


V 


P 




1 


11 


M 


13 


14 


15 


16 


17 


18 


19 


20 


•*;tt 


GCT| 


CTC 


ccc 


ACC 


CTC 


CTA 


CCC 


ATC 


CTC 






1 Mm I! 


I 




I ! 







s 




a 


r 


P 


d 


f 


c 


1 


e- 1 


2 1 


l» 


23 


2 4 


25 


26 


27 


28 


29 


30 


TCT 


, TTT 


CCT 


CCT 


CCC 


CAT 


TTC 


TCT 


CTC 


CAC ; 








1 ACCtll 1 






Ava I 


















Xho I 1 


P 


P 


y • 


t 


g 


P 


c 


k 


a 


r 


3 1 


32 


33 


34 


35 


36 


37 


33 


39 


40 


CCC 


CCA 


TAT 


ACT 


CCC 


CCC 


TCC 


AAA 


CCC 


CCC 




PflM r 


1 








Hr.sH IT 



Aon I I 



Dra II 



Pss I 



if 

Sf . 

1*- ■ 

f/ "• 

tv. 

£\ 

I 



ate 



l r 
42 43 
ate | cqt 



t | s I k 
127 | i:8l 129 
ACT | TCC j 'AAa 



qcq|qct|qcq| - 3' 
r.pacer L 



m 
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Table 29: DNA_synth2 
5'- [CCA | CCA I ACC! 
| CCT | ACC I AGC f CTC I ACT I 
f ATC I AAC | AAA 1 TCT I CTC 1 CTT I CTT ' A AG I CCT [ ACC I 



| CTT I CCT I GTC I CCC I ACC [ CTC I CTA I CCG I ATG 1 CTG I 
oligr6 = 3'- ggc tac gac 

/ 3 ' « ol igi 5 
j TCT I T T T f CCT I CCT I CCC [ GAT | TTC | TGT | CTC J GAG | 
aga aaa cga gca ggc eta aag aca gag etc 



| CCG | CCA | TAT | ACT ( CCG \ CCC | TGC | AAA | GCG | CCC | 
ggc ggt ata tga ccc ggg acg ttt cgc gcg 



| ATC | ATC | CCT j 
tag tag gca 



"Top" strand 
"Bottom" strand 90 



| ACT | TCC | AAA | GCG | CCT | CCC | 
tga age ttt cgc cga cgc - 5' 



99 



Overlap 
Net- length 



24 (14 c/g and 10 a/t) 
155 



m 

^ J 
%•->>. 



W 



ifc ' i i 1 i. ■ , T'i'i ",, , „i i ,■■ 1 V ■HEM 



279 



Table 31: DMA_synth3 
S ' - 1 CCC I TGC ! ACA I CCG I CCC I 
| ATC ( ATC 1 CGT I TAT ' TTC I TAC i A AC I CCT I AAA I 



[ CCA I CCC I CTC ) TGC : CAC I ACC I TTT ! OTA I TAC | CCT ! GCT I 
olig?3 = 3 ' - g cca cca 



/ 3' = oligjf7 
| TGC I CCT t GCT I AAC ' CGT ! AAC i AAC | TTT J AAA j 
acg gca cga tte gca teg ttg aaa ttt 



"|| 
•I 

.1 



| TCG | GCC | GAA | CAT I TGC | ATC | CCT j ACC J TGC | CGT | 
age egg ctt eta acg tac gca tgg acg cca 



| GGC | CCC | GCT I GAA | 
ccg egg .cgt etc 



| TTT | ACT | TCC | AAA | CCG j TCG j CCC J 
aaa.tga age ttt cqc age ggc -5' 



"Top" strand 
"BottoD" strand 
Overlap 
Net length 



93 
97 

25 (15 g/c & 10 a/t) 
146 





0 



:so 



Table 32: CNA_seq4 





9 


a 


a 


e 


g 


5' 


80 


81 


82 


33 


84 


|cct]cgc|cct 


GCC 


ccc 


CCT 


CAA 


CCT 


| spacer 


Bbe T 










Mar T 









d | d | 
85 1 SS| 

cat | cat; 



p 

87 
CCC 



a 

88 
CCC 



sfi r 



k I a 
89 f 90 
AAA ( CCC 



a 

SI 
GCC 



f 


n 


s 


1 


a 1 a 


s 


a 


92 


93 


94 


95 


9 6 1 97 


98 


99 


TTT 


AAC 


TCT 


CTC 


CAA GCT 


TCT 


CCT 










|Hind 3l 





t I 

100; 

ACC I 



e 
101 
GAA 



y 

102 
TAT 



1 

103 
ATC 



g I y 

104 105 
GGT | TAC 



a 
106 
GCG 

Mlu 



107 
TGG 



I a I □ 
j 1C8| 109 
I CCC | ATC 



v 
110 
GTC 



lll| 112 
GTG I CTT 



BstX I 



N'co II 



1 
113 
ATC 



114 
GTT 



g | a ! t 
115 116! 117 
GGT CCT ACC 



1 

118 
ATC 



g | i I 

119 ! 120 
CCT | ATC | 




■ , l_ • ' I 




■ «* vi 



28 1 

Table 33: D::A_synth4 
5' I GCT 1 CGC 1 CCT'I CGC I CCC t GCT I CAA I OCT I CAT ! C AT ' 
ICCClGCCl AAA | CCC 1 CCC I 

| TTT I AAC I TCT | CTG I CAA I OCT ■ TCT 1 GCT I ACC | 

f CA A I TAT ! AT C j GCT I TAC ' CCC [ TCG I 
oligSlO =3'- oca tag cca atg cgc acc * 



/ 3' = olig$9 
|CCCl ATC I C TG | GTC | GTT | 
egg tac cac cac caa 

| ATC | GTT | CGT | CCT J ACC | ATC | GCT | ATC } 
tag caa cca cga tgg tag cca tag 



| AAA | CTC | TTT | AAC | AAA | TTT ] ACT | TCC | AAA | CCC [ TCT j TCA | 
ttt gac aaa ttc ttt aaa tga age ttt cgc aga act - 5' 



"Top" strand 100 

"Bottom" strand 93 

Overlap 25 (14 c/g and II a/t) 

Net length 149 
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Table 24: Some interaction sets in BPTI 



Res. 
J 



-4 

-3 
-2 
-1 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
2S 
26 
27 
28 
29 < 
.30 
31 
32 
33 
34 
35 
36 
37 
38 
39 



Nurabe] 
Diff . 
AAs 



Contents 



2 

■ 2 
5 

10 

10 

10 
9 

10 
7 
1 

10 
5 
7 
9 

10 

10 
2 
5 
3. 

12 
7 

12 
6 
7 
5 
A 
6 
2 
4 

10 
9 
5 
7 

10 



D -32 
E -32 

T P F Z -29 

23 R3 Q2 T2 H G L K E -18 
04 T2 P2 Q2 E C U K R -18 
R21 A2 K2 H2 P L I T C D 
P20 R4 A2 112 N E V F L 



12 3 4 5 



Y C 
S 



A L 



015 K6 T3 R2 P2 S 

F19 D4 L3 Y2 12 A2 
C33 

Lll E5 N4 K3 Q2 12 Y2 02 T 
L18 £11 K2 S Q 

P26 H2 A2 I L G F 

P17 A6 V3 R2 Q L K Y F 

Yll E7 04 A2 N2 R2 V2 S 

T17 P5 A3 R2 I S Q 
G32 K 

P2 2 R6 L3 M I 
C31 T A 

K15 R4 Y2 M2 L2 -2 



I 0 



Y V K 



V G A I N F 



A22 G5 Q2 R K 0 
R12 K5 A2 Y3 H2 S2 F2 L M T G P 
121 M4 F3 L2 V2 T 
111 P10 R6 S2 K2 L Q 
R19 A7 S4 L2 Q 
Y18 F13 W I 
F14 YI4 H2 A N S 
Y32 F 

N2C K3 D3 S 

A12 S5 Q3 P3 W3 L2 T2 K C R 

K16 A6 T2 E2 S2 R2 C H V 

A18 SB K3 L2 T2 

G13 K10 N5 Q2 R II H 

L9 Q7 K7 A2 F2 R2 M C T N 

C33 

Q12 Ell L4 K2 V2 Y M 
T12 PS K4 Q3 E2 L2 G 
F3 3 



V S R A 



Vll 18 T3 
Y31 W2 
G27 S5 R 
C33 

C31 T A 
R13 CO K4 Q3 



C2 Q2 



R 
P 
D 
F 
C 
L 
E 
P 
P 
Y 



P 
C 
K 
A 
R 
I 
I 
R 
Y 
F 
Y 
N 
A 
K 
A 
C 
L 
C 
Q 
T 
F 
V 
Y 
C 
C 

c 

R 



3 

S 3 
s 

s 3 
x 
s 
s 

s 3 
s 
3 
s 
3 
s 
s 
3 
s 
3 
s 
3 
3 
s 
3 
x 



kg? 



m 
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Table 34: continued. 



NuraJber 
Res. Diff. 



M 


AAs 


Contents 


BPTT 


1 2 "3 


4 


5 


40 


2 


C22 


All 


A 


s 


s 


5 


41 


3 


N20 


Kll D2 


K 




4 


s 


42 


9 


All 


R9 S4 C3 H2 D Q K N 


R 




s 


5 


43 


2 


N31 


G2 


N 






s 


44 


3 


1/21 


Rll K 


N 






s 


' 45 


2 


F32 


Y 


F 






s 


46 


8 


K24 


E2 S2 D H V Y R 


K 






5 


47 


2 


T19 


S14 


S 


5 




5 


48 


9 


All 


19 E4 T2 V/2 L2 R K D 


A 


2 s 




s 


49 


7 


E19 


D6 A2 Q2 K2 T H 


E 


2 




s 


50 


6 


■ E16 


D12 L2 M Q K 


D 


s 




5 


51 


1 


C33 




C 


X 




X 


52 


7 


R13 


M10 L3 E3 Q2 H V 


M 


2 




s 


53 


8 


R21 


Q3 E2 H2 C2 G K D 


R 


s 




5 


54 


7 


T2 3 


A3 V2 E2 I Y K 


T 






5 


55 


1 


C33 




C 






X 


56 


8 


G15 


V8 13 E2 R2 A L S 


c 








57 . 


3 


C19 


V4 A3 P2 -2 R L N 


G 








53 


8 


All 


-10 P3 K3 S2 Y2 R F 


A 








59 


9 


-24 


C2QEAYSPR 










60 


6 


-23 


Q R I G D 










61 


3 


-31 


T P 










62 


2 


-32 


D 










63 . 


2 


-32 


K 










64 


2 


-32 


S 











indicates secondary set 
indicates in or close to surface 
highly conserved. 



but buried and/or 



r 
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Table 36: Distances, OFT I residue sec *2 
Distances in Angstrons bctueen C bGCa s. 
Hypothetical C beLa '-tas added to each Glycine. 



119 
Y21 
A27 
C23 
L29 
Q3 1 
732 
V24 
A4 3 
E49 
>^52 



R17 
V . 7 



119 Y21 A27 C28 L29 Q31 T32 



A48 



15, 
22 . 



26.6 
22 . 5 
16. 1 
11.7 
5.6 
13 . 5 
22.0 
23.* 



8 . 4 
17 . 1 
2 0.4 
15 . 8 
10. 4 
5 . 2 
6.5 
11.0 
14 .7 
I *. 3 



12 .2 
13.3 
9.6 
6.8 
6. 1 
11.6 
5.4 
8.9 
e . 6 



5. 3 
5. 1 
6.3 
12 . 0 
17.6 
12 . 6 
16.9 
12 ■ 2 



5.2 
10. 6 
15.5 
21.7 
13.3 
16. 1 
10. 3 



6.3 
10.9 
13.0 

3 . 4 
12.2 

7 . 6 



5.4 
11.4 

8.3 
13.9 
11.3 



3 . 2 
3 . 3 
13 . 3 
13.2 



15.7 
19 . 3 

:o. 0 



5.5 
6 . 2 



P9 

Til 

X15 

A15 

113 

R2 0 

P2 2 

N24 

K26 

C30 

F33 

Y35 

S47 

D50 

C51 

R53 

R39 



14 . 0 
9 . 5 
7.9 
5.5 
6. 1 
10. 6 
15.6 
19.9 
24 . 4 
18.9 
10.3 
8 . 4 
17.6 
20.0 
18 .9 
25.4 
15.4 



11.3 
11.2 
14 . 6 
10. 1 
6.0 
5.. 3 
10.9 
14 . 7 
20.1 
12. 1 

7. ; 

7.4 
10. 6 
13.6 
12.2 
18.6 
16.9 



9.0 
13 . 5 
20. 1 
15.9 
11 . 2 
5.4 
5.6 
9 . 4 
15.2 
4 . 6 
7.7 
9.4 
6.6 
7.2 
4 . 0 
11.0 
17. 1 



12.2 
18.8 



15.4 
22 . 5 



27 . 
25 . 
21 . 



16 . 0 
10.5 
4 . 1 
5.4 
8.3 
12.6 

16 . 4 
17.3 
17.2 
12 . 1 

17 . 2 
24 .9 



31, 

23 , 

24 , 
13 . 



13.3 
19.3 
27.9 
24 . 6 
20.2 
1 4 . 6 



12.3 



7.7 
9.5 
16.4 
21.4 
17.9 
16.3 
12 . 2 
15.0 
27 . 2 



10. 
6, 
9 . 



1 

17.9 
13.4 
13.5 
3.3 
13.0 
24 .0 



7.9 
13.5 
2 1.4 

ia . 6 

14 . 7 
9.8 
C.2 
4 . 3 
10. 1 
5.9 
6.6 
12.2 
12.6 
13.5 
8.3 
15.7 
20. 1 



9 . 2 

12 . 1 

13 . ). 



14 . 
10 . 



6.9 
3 . 1 
10 . 0 
15.3 



3.2 
5.6 
9.5 
10.4 
12.9 
9 . 7 
16,7 
13 . 7 



8 . 7 
5.7 
10. 3 
3 . 6 
7.0 
7 . 3 
1C.3 
14 . 7 
19.0 
14.9 
5.5 
5.3 
15.9 
17.6 
15.3 
3 



22 



13.9 
13.5 
24 . 6 
19 . 3 
15.0 
10.2 
10. 3 
11.4 
17.0 
4.9 
12.2 
14.4 
5.3 
7 . 6 
5.4 
9.7 
22.3 




1: 




© . 9 

236 

Table 36, continued. 
Distances in Angstroms between C beta s. 



Hypothetical C beta vas added to each Glycine. 





E49 


M52 


P9 


Til 


K15 


A16 


118 


R20 
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k = equal parts of T- and G; n = equal parta of C and A; 

q « (.26 T, .13 C, .26 A, a.':d .30 G) ; 

f « (.2,2 T, .16 C, .40 A , and .22 C) ; 

* = complement of symbol above 

Residue 40 42 50 52 57 71 

Possibilities 21 x 21 x 21 x 21 x 21 x 21 = 8.6 x 10 7 
Abundance x 10: 

of PP3D .763 .271 .459 .67 1 .600 .459 

Produce = 1.77 x 10™ 3 

Parent = 1/(5.5 x 10 7 ) lo&ct favored = 1/(4.2 x 10 9 ) 
Least favored one-nmino-ac id substitution from PPB0 present 
at 1 in 1.6 :< 10 7 
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Overlap =» 15 (11 CC, 4 AT) 
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age acc **m eta acg tac qcg ace tgc -5' 

"k - equal parts of T and C; v = equal parts of C, 
m - equal parts of C and A; r = equal parts of A 
v = equal parts of A and 7; 
q = (.26 T, .Id C, .26 A, and .30 G) ; 
f = (.22 T, .16 C ( .40 A. -ind .22 C) ; 
* - complement of symbol above 



A, and C; 
and C; 



Residue 
Possibilities 



38 
A 



41 



43 
9 



4 4 
2 



Abundance x 10 2.5 2.5 .833 
Product =2.3 v i n~B 



51 
21 



. G63 



54 55 
2 1 x 21 x 
= 6.2 
.397 .437 . 



7 2 
2 i 

x 1C 7 
602 



Parent - 1/(4 
Least favored 
at 1 in 1.2 x 



x 10" 

. 4 
or 
10' 



( ) lcar 

one-an ino-ac id 
>7 



t favored - 1/(1. 
Liubst i tut ion f vcz 



25 x : 
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Table 41: vg DNA set;f2 of BPTI 2.3 
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Overlap = 13 (7 CG, 6 AT) 
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**n atg cca cca 



acg gca cga ttc gcg acc ggc 
j esq I I spacer I 

= equal parts of T and G; m = equal parts of C and A; 

=* equal parts of A and T; n = equal parts of A , C , G , T ; 

= equal parts A , G , T ; v = equal parts A,C (J ; 

» (.26* T, .18 C, .26 A, and .30 G) ; 
= (.22 T, .16 C, .40 A, and .22 G) ; 
- complement of symbol above 



Residue 
Possibilities 
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A method of obtaining a protein that binds a 
predetermined target that comprises: 

a) preparing a variegated population of replicable 
genetic packages, each package including. a nucleic 
acid construct coding on expression for an outer- 
surf ace -d isp 1 a yed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
binding said target, where a plurality of 
different potential binding domains are displayed 
by said population, 

b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packages vith target material so 
that the potential binding domains of the proteins 
and the target material cay interact, and 
separating packages bearing a binding domain that 
binds target material froa packages t- w .at do not so 
bind, and 



d, recovering and replicating at least 
bearing a successful binding domain. 



one package 



30 



The method of claim 
repLicable genetic 
obtained by: 



1 wherein the population of 



packages of st€ 
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A method of obtaining a protein that binds a 
predetermined target that comprises: 

a) preparing a variegated population of replicable 
genetic packages, each package including. a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and <ii) a potential binding domain for 
binding said target, where a plurality of 
different potential binding domains are displayed 
by said population, 

b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packages with target material so 
that the potential binding domains of the proteins 
and the target material say interact, and 
separating packages bearing a bindir.g dona in that 
binds target material from packages t^.at do not so 
b i nd , a nd 

d, recovering and replicating at least one package 
bearing a successful binding domain. 

The method of claim 1 wherein the population of 



replicable genetic packages 
obtained by: 
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certain predetermined degree of affinity for 
target, material, and the required degree of 
affinity is increased for each new variegated 
population. 

The method of claim 1 uherein the displayable 
potential binding protein is a chimeric protein. 

The method of claim 7 wherein said signal is 
provided by a segment of said chimeric protein 
which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
said genetic package, said portion directing the 
transport of said chimeric protein to the outer 
surface of the genetic package. 

The method cf claim 2 wherein the second sequence 
is obtained by operably linking a DMA sequence 
encoding a potential outer surface transport 
signal to a Dt'A sequence expressing a protein that 
confers a selectable phenotype to obtain a test 
construct, "introducing the test constructs into 
suitable hosts, causing expression of said DMA 
construct, selecting genetic packages that display 
the protein that confers the selectable phenotype 
on their outer surface, and choosing as said 
second sequence the DUA sequence encoding the 
potential outer surface transport signal of one of 
such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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certain predetermined degree of affinity for 
target . material , and the required degree of 
affinity is increased for each new variegated 
population. 

5 * . 

7. The method of claim 1 wherein the displayable 
potential binding protein is a chimeric protein. 

8. The method of claim 7 wherein said signal is 
10 provided by a segment of said chimeric protein 

which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
15 said genetic package, said portion directing the 

transport of said chimeric protein to the outer 
surface of the genetic package. 

9. The method cf claim 2 wherein the second sequence 
20 is obtained by operably linking a DHA sequence 

encoding a potential outer surface transport 
signal to a DMA sequence expressing a protein that 
confers a selectable phenotype to obtain a test 
construct, introducing the test constructs into 

25 suitable hosts, causing expression of said rWA 

■ construct, selecting genetic packages that display 
the protein that confers the selectable phenotype 
on their outer surface, and choosing as said 
second sequence the DNA sequence encoding the 

30 potential outer surface transport signal of one of 

such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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17. The method of clain 3 in which the binding domain 
of the kno'-m protein has a known sequence of amino 
acids, and the identity and spatial relationship 
of the amino acids forming a surface of said 
domain is known. 



18. 



The method of claim ■ 3, said target material 
comprising one or more discrete molecules, said 
parental potential binding domain being 
10 characterized as a sequence of amino acids, 

further comprising identifying an interaction set 
.of amino acids which are on the surface of the 
parental potential binding domain and which can 
all simultaneously touch a single molecule of the 
15 target material, and obtaining potential binding 

domains by substituting a different amino acid for 
one or more of the amino acids in said interaction 
set. 

19. The method of clain 3 wherein the level of 
variegation of the population is chosen such that 
the packages displaying potential binding domains 
obtained by single amino acid substitutions in the 

• amino acid sequence of the parental potential 
binding dc-ain are present in detectable amounts. 

20. The method of claim 3 wherein the amino acid 
substitutions to be made are chosen after 
consideration of the 3D structure of the parental 
potential binding domain. 
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21. The method of claim 15 wherein the amino acid 
substitutions to be made are for amino acids of 
the chosen domain of the known protein which are 
known to be alterable without reducing the melting 
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affinity separated and retain viability 



30. The method of claim 3 in which the initially 
chosen parental potential binding protein has at 
least one stable binding domain and said" domain 
has a melting point of at least 60°C and is stable 
over a pH range of at least 3.0-8.0. 



31. The method of claim 15 wherein the known binding 
10 protein is an enzyme, the activity of which has a 

deleterious effect on the replicable genetic 
package, the host of the replicable genetic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
15 an analogue of the known binding protein that does 

"not have such enzymatic activity. 



32. The method of claim 1 wherein the target contains 
ionizable groups and the pH of the solutions of 
20 the intended use and the pH of the affinity 

separations are chosen so that both the potential 
binding protein and the target remain stable. 
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33. The method of claim 1 wherein the target contains 
ionizable groups, further comprising providing 
counter ions in affinity separations and the 
solutions of the intended use to reduce 
electrostatic repulsion between the potential 
binding protein and the target. 

34. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 
affinity separation, that the potential binding 






affinity separated and retain viability. 

30. The method of claim 3 in which the initially 
chosen parental potential binding protein has at 
least one stable binding domain and said" domain 
has a melting point of at least 60°C and is stable 
over a pH range of at least 3.0-8.0. 



31. The method of claim 15 wherein the known binding 
10 protein is an enzyme, the activity of which has a 

deleterious effect on the replicable genetic 
package, the' host of the replicable genetic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
15 an analogue of the known binding protein that does 

not have such enzymatic activity. 
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32. The method of claim 1 wherein the target contains 
ionizable groups and the pK of the solutions of 

20 the intended use and the pH of the affinity 

separations are chosen so that both the potential 
binding protein and the target remain stable. 

33. The method of claim 1 wherein the target contains 
25 - ionizable groups, further comprising providing 

counter ions in affinity separations and the 
solutions of the intended use to reduce 
electrostatic repulsion between the potential 
binding protein and the target. 

30 

34. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 

35 affinity separation, that the potential binding 
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thereof embodying an outer surface transport 
signal: 

45. The method of claim 42 wherein .the signal is 
5 provrded by the gene III protein, of M13 or a 

segment thereof eabodying an outer surface 
transport signal . 

46. The method of clain 3 wherein the initially chosen 
10 parental potential binding domain is at least 50* 

homologous with the binding donain of bovine 
pancreatic trypsin inhibitor, having the residues 
C5, G12, C30, F33, G37, C51 and C55. 

15 47. The method of claim 46 further specifying that: a} 
residue 21 contains ere of the amino acids Y, F ( 
W, or' I; b) residue 23 contains one of the amino 
acids Y or F; c) residue 35 contains, one of the 
residues Y ( F, or W; d) residue 40 contains one of 

20 the amino acids G or A; ej residue 45 contains 

either F or Y. 

48. The method of claia 47 wherein the residues to be 
varied are chosen from among residues 17, 19, 21, 
25 27, 28, 29, 31, 32, 34, 48, 49, and 52. 
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49 . The method of clais "»3 wr.erein the additional 
residues 9, 11, 15, 16, 13, 20, 22, 24, 26, 35, 
47, and 53 are allowed to vary. 

50. The nethod of claim 47 wherein the residues to 
vary are picked frcn one of the interaction sets 
identified in table 34. 
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35 51. The method of claim 2 wherein the distribution of 






45. 



46. 



10 



15 



20 



48. 



25 



30 



346 



thereof embodying 
signal: 



outer surface transport 



The method of claim 42 wherein .the signal is 
provided by the gene III protein, of M13 or a 
segment thereof embodying an outer surface 
transport signal. 

The method of claim 3 wherein the initially chosen 
parental potential binding domain is at least 50* 
homologous with the binding docain of bovine 
pancreatic trypsin inhibitor, having the residues 
C5, C12, C30, F33, G37, C51 and C55. 

The method of claim 46 further specifying that: a) 
residue 21 contains cr.e of the amino acids Y, F, 
W, or I; b) residue 23 contains one of the amino 
acids Y or F; c) residue 35 contains . one of the 
residues Y , F, or W; d) residue 40 contains one of 
the amino acids G or A; ej residue 45 contains 
either F or Y. 

The method of claim 47 wherein the residues to be 
varied are chosen from among residues 17, 19, 21, 
27, 28, 29, 31, 32, 34, 48, 49, and 52. 



49. The miithod of claim 43 vr.erein the additional 
residues 9, 11, 15, 16, 13 , 20, 22, 24 , 26, 35, 
47, and 53 are allo-ed to vary. 



50. The method of claim 47 wherein the residues to 
vary are picked from one of the interaction sets 
identified in table 34. 

35 51. The method of claim 2 wherein the distribution of 
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insensitive to UV, tolerant of desiccation, and 
resistant to a pH of 2.0 to 10.0. 

58. The method of claim 1 wherein the genetic packages 
5 nay be frozen and later revived. 

59. The method of claim 1 wherein the genetic package 
is a cell with a doubling time of 20-40 minutes. 

.10 60. The method of claim 1 wherein the genetic package 
is a virus with a burst size of at least 
100/infectcd cell. 

61. The method of claim 1 wherein the genetic packages 
15 are harvested by cent r i f ugation without loss of 

viability. 

62. The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected f roa 

20 the group consisting of (a) binding domains of 

bovine pancreatic trypsin inhibitor, crambin, 
ovomucoid, T4 lysozyne, hen egg white lysozyne, 
ribonuclease , and azurin, and (b) domains at least 
50* homologous with any of the foregoing domains 

25 and which have a melting point of at least 60°C. 




63. The method of claim 36 wherein the outer surface 
transport signal is provided by the lam3 protein 
or a segment thereof embodying an outer surface 
transport s ignal . 



The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 
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5 may be frozen and later revived. 

59. The metl.od of claim 1 wherein the genetic package 
is a cell with a doubling time of 20-40 minutes. 

.10 60. The method of claim 1 wherein the genetic package 
is a virus with a burst size of at least 
100/infectcd cell. 

61. The method of claim 1 wherein the genetic packages 
15 are harvested by centr i f ugation without loss of 

viab il ity . 

62. The method of claim 3 wherein the initially chesen 
parental potential binding domain is selected tron 

20 the group consisting of (a) binding domains of 

bovine pancreatic trypsin inhibitor, crambin, 
ovomucoid, T4 lysozyne, hen egg white lysozyre, 
ribonuclease , and azurin, and (b) domains at least 
50* homologous with any of the foregoing domains 

25 ^ and which have a melting point of 2t least 60°C. 

63. The method of claim 36 wherein the outer surface 
transport signal is provided by the lam3 protein 
or a segment thereof embodying an outer surface 

5 transport signal. 
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The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cotC or cotD protein or a segment thereof 
embodying an outer surface transport signal. 
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is further chosen to yield the largest value for 
the quantity (( 1 . -abundance ( stop codons) ) times 
(abundance of the least abundant amino 
acid) / (abundance of the most abundant amino 
acid) ) . 

The protein of claim 66, uherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 
recognizing a second target material. 
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is further chosen to yield the largest value for 
the quantity {( 1 .-abundance (stop codons)) times 
(abundance of the Uast abundant amino 
• acid)/ (abundance of the most abundant amino 
acid) > . 

72. The protein of claim 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 
recognizing a second target material. 
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